datacreator-sdk 0.1.5__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,114 @@
1
+ Metadata-Version: 2.4
2
+ Name: datacreator-sdk
3
+ Version: 0.1.5
4
+ Summary: Python SDK for the DataCreator AI dataset generation API
5
+ Author-email: DataCreator AI <team@datacreatorai.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://datacreatorai.com
8
+ Requires-Python: >=3.8
9
+ Description-Content-Type: text/markdown
10
+ Requires-Dist: requests>=2.32
11
+
12
+ # DataCreator AI – Software Development Kit (SDK)
13
+
14
+ A lightweight, proprietary Python client for accessing the **DataCreator AI** synthetic data generation API. This SDK is designed for desktop and server environments, enabling programmatic dataset generation with simple Python calls.
15
+
16
+ ---
17
+
18
+ ## Installation
19
+
20
+ <!-- ### From TestPyPI (recommended during beta-testing)
21
+
22
+ ```bash
23
+ pip install -i https://test.pypi.org/simple --extra-index-url https://pypi.org/simple datacreator-sdk
24
+ ``` -->
25
+
26
+ ## Usage Overview
27
+
28
+ The `DataCreatorClient` provides a single primary method, `generate()`, that generates conversations suitable for fine-tuning AI models. The SDK handles communication, progress messages, response validation, and writing the final dataset to a `.jsonl` file.
29
+
30
+ ---
31
+
32
+ ## `generate()` Method
33
+
34
+ The `generate()` function triggers the synthetic data generation workflow and saves the final dataset locally.
35
+
36
+ ### **Purpose**
37
+
38
+ Initiates a dataset generation request and produces a `.jsonl` file containing structured conversational examples aligned with your specified theme.
39
+
40
+ ### **Authentication**
41
+ Please contact us at team@datacreatorai.com for a API Key. Once you've received your unique key, send it during instantiation as shown in the example code.
42
+
43
+ ### **Parameters**
44
+
45
+ The method accepts the following arguments:
46
+
47
+ - **`main_theme` (str, required)**
48
+ The central topic around which the synthetic dataset should be generated.
49
+ Example: `"Daily tasks and personal planning for young working professionals"`
50
+ Any harmless and non-NSFW topic is allowed.
51
+
52
+ - **`num_of_turns` (int, optional, default: 3)**
53
+ Number of conversational turns per datapoint. The maximum number of turns allowed is 5.
54
+ A "turn" is typically one `user → assistant` exchange. The generation supports multiple message roles including **user**, **assistant**, **system**, and **tool**.
55
+ Example: `2` turns produces:
56
+ - user
57
+ - assistant
58
+ - user
59
+ - assistant
60
+
61
+ - **`num_of_datapoints` (int, optional, default: 100)**
62
+ Number of independent dialogues/conversations to generate.
63
+ The maximum number of data points allowed per generation is 1000.
64
+ Example: `10` produces 10 conversations.
65
+
66
+ - **`language` (str, optional, default: "English")**
67
+ The language in which the dataset should be generated.
68
+
69
+ - **`system_prompt` (str, optional)**
70
+ A custom instruction or persona that the assistant should follow throughout the conversations.
71
+
72
+ - **`max_tokens` (int, optional, default: 2048)**
73
+ The maximum token limit for each conversation.
74
+
75
+ - **`use_rolling_temperatures` (bool, optional, default: False)**
76
+ When set to `True`, the generator uses varying temperatures across data points to ensure **higher lexical diversity**.
77
+
78
+ - **`use_model_rotation` (bool, optional, default: False)**
79
+ When set to `True`, the system rotates between different high-quality models to provide **better structural diversity**.
80
+
81
+ - **`output_file` (str, optional, default: "dataset.jsonl")**
82
+ The path and filename where the final dataset will be written.
83
+ Example: `"data.jsonl"`
84
+
85
+ The final dataset is a **JSON Lines (`.jsonl`) file**, where each line represents a conversation formatted as a list of messages. This format is fully compatible for fine-tuning with providers like OpenAI, Mistral, and Anthropic.
86
+
87
+ The SDK supports the following message roles:
88
+ - **`system`**: Sets the context or behavior of the assistant.
89
+ - **`user`**: The human/user prompt.
90
+ - **`assistant`**: The model's response.
91
+ - **`tool`**: Represents tool outputs or function calls (experimental).
92
+
93
+ ### Example Code
94
+
95
+ ```bash
96
+
97
+ import os
98
+ from dotenv import load_dotenv
99
+ from datacreatoraisdk import DataCreatorClient
100
+
101
+ load_dotenv()
102
+
103
+ # Example usage
104
+ if __name__ == "__main__":
105
+ client = DataCreatorClient(api_key=os.getenv("DATACREATOR_API_KEY"))
106
+ client.generate(
107
+ main_theme="Natural dialogues between a user and assistant asking about daily tasks, errands, and emotions.",
108
+ num_of_turns=2,
109
+ num_of_datapoints=100,
110
+ use_rolling_temperatures=True,
111
+ use_model_rotation=False
112
+ )
113
+
114
+ ```
@@ -0,0 +1,103 @@
1
+ # DataCreator AI – Software Development Kit (SDK)
2
+
3
+ A lightweight, proprietary Python client for accessing the **DataCreator AI** synthetic data generation API. This SDK is designed for desktop and server environments, enabling programmatic dataset generation with simple Python calls.
4
+
5
+ ---
6
+
7
+ ## Installation
8
+
9
+ <!-- ### From TestPyPI (recommended during beta-testing)
10
+
11
+ ```bash
12
+ pip install -i https://test.pypi.org/simple --extra-index-url https://pypi.org/simple datacreator-sdk
13
+ ``` -->
14
+
15
+ ## Usage Overview
16
+
17
+ The `DataCreatorClient` provides a single primary method, `generate()`, that generates conversations suitable for fine-tuning AI models. The SDK handles communication, progress messages, response validation, and writing the final dataset to a `.jsonl` file.
18
+
19
+ ---
20
+
21
+ ## `generate()` Method
22
+
23
+ The `generate()` function triggers the synthetic data generation workflow and saves the final dataset locally.
24
+
25
+ ### **Purpose**
26
+
27
+ Initiates a dataset generation request and produces a `.jsonl` file containing structured conversational examples aligned with your specified theme.
28
+
29
+ ### **Authentication**
30
+ Please contact us at team@datacreatorai.com for a API Key. Once you've received your unique key, send it during instantiation as shown in the example code.
31
+
32
+ ### **Parameters**
33
+
34
+ The method accepts the following arguments:
35
+
36
+ - **`main_theme` (str, required)**
37
+ The central topic around which the synthetic dataset should be generated.
38
+ Example: `"Daily tasks and personal planning for young working professionals"`
39
+ Any harmless and non-NSFW topic is allowed.
40
+
41
+ - **`num_of_turns` (int, optional, default: 3)**
42
+ Number of conversational turns per datapoint. The maximum number of turns allowed is 5.
43
+ A "turn" is typically one `user → assistant` exchange. The generation supports multiple message roles including **user**, **assistant**, **system**, and **tool**.
44
+ Example: `2` turns produces:
45
+ - user
46
+ - assistant
47
+ - user
48
+ - assistant
49
+
50
+ - **`num_of_datapoints` (int, optional, default: 100)**
51
+ Number of independent dialogues/conversations to generate.
52
+ The maximum number of data points allowed per generation is 1000.
53
+ Example: `10` produces 10 conversations.
54
+
55
+ - **`language` (str, optional, default: "English")**
56
+ The language in which the dataset should be generated.
57
+
58
+ - **`system_prompt` (str, optional)**
59
+ A custom instruction or persona that the assistant should follow throughout the conversations.
60
+
61
+ - **`max_tokens` (int, optional, default: 2048)**
62
+ The maximum token limit for each conversation.
63
+
64
+ - **`use_rolling_temperatures` (bool, optional, default: False)**
65
+ When set to `True`, the generator uses varying temperatures across data points to ensure **higher lexical diversity**.
66
+
67
+ - **`use_model_rotation` (bool, optional, default: False)**
68
+ When set to `True`, the system rotates between different high-quality models to provide **better structural diversity**.
69
+
70
+ - **`output_file` (str, optional, default: "dataset.jsonl")**
71
+ The path and filename where the final dataset will be written.
72
+ Example: `"data.jsonl"`
73
+
74
+ The final dataset is a **JSON Lines (`.jsonl`) file**, where each line represents a conversation formatted as a list of messages. This format is fully compatible for fine-tuning with providers like OpenAI, Mistral, and Anthropic.
75
+
76
+ The SDK supports the following message roles:
77
+ - **`system`**: Sets the context or behavior of the assistant.
78
+ - **`user`**: The human/user prompt.
79
+ - **`assistant`**: The model's response.
80
+ - **`tool`**: Represents tool outputs or function calls (experimental).
81
+
82
+ ### Example Code
83
+
84
+ ```bash
85
+
86
+ import os
87
+ from dotenv import load_dotenv
88
+ from datacreatoraisdk import DataCreatorClient
89
+
90
+ load_dotenv()
91
+
92
+ # Example usage
93
+ if __name__ == "__main__":
94
+ client = DataCreatorClient(api_key=os.getenv("DATACREATOR_API_KEY"))
95
+ client.generate(
96
+ main_theme="Natural dialogues between a user and assistant asking about daily tasks, errands, and emotions.",
97
+ num_of_turns=2,
98
+ num_of_datapoints=100,
99
+ use_rolling_temperatures=True,
100
+ use_model_rotation=False
101
+ )
102
+
103
+ ```
@@ -0,0 +1,114 @@
1
+ Metadata-Version: 2.4
2
+ Name: datacreator-sdk
3
+ Version: 0.1.5
4
+ Summary: Python SDK for the DataCreator AI dataset generation API
5
+ Author-email: DataCreator AI <team@datacreatorai.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://datacreatorai.com
8
+ Requires-Python: >=3.8
9
+ Description-Content-Type: text/markdown
10
+ Requires-Dist: requests>=2.32
11
+
12
+ # DataCreator AI – Software Development Kit (SDK)
13
+
14
+ A lightweight, proprietary Python client for accessing the **DataCreator AI** synthetic data generation API. This SDK is designed for desktop and server environments, enabling programmatic dataset generation with simple Python calls.
15
+
16
+ ---
17
+
18
+ ## Installation
19
+
20
+ <!-- ### From TestPyPI (recommended during beta-testing)
21
+
22
+ ```bash
23
+ pip install -i https://test.pypi.org/simple --extra-index-url https://pypi.org/simple datacreator-sdk
24
+ ``` -->
25
+
26
+ ## Usage Overview
27
+
28
+ The `DataCreatorClient` provides a single primary method, `generate()`, that generates conversations suitable for fine-tuning AI models. The SDK handles communication, progress messages, response validation, and writing the final dataset to a `.jsonl` file.
29
+
30
+ ---
31
+
32
+ ## `generate()` Method
33
+
34
+ The `generate()` function triggers the synthetic data generation workflow and saves the final dataset locally.
35
+
36
+ ### **Purpose**
37
+
38
+ Initiates a dataset generation request and produces a `.jsonl` file containing structured conversational examples aligned with your specified theme.
39
+
40
+ ### **Authentication**
41
+ Please contact us at team@datacreatorai.com for a API Key. Once you've received your unique key, send it during instantiation as shown in the example code.
42
+
43
+ ### **Parameters**
44
+
45
+ The method accepts the following arguments:
46
+
47
+ - **`main_theme` (str, required)**
48
+ The central topic around which the synthetic dataset should be generated.
49
+ Example: `"Daily tasks and personal planning for young working professionals"`
50
+ Any harmless and non-NSFW topic is allowed.
51
+
52
+ - **`num_of_turns` (int, optional, default: 3)**
53
+ Number of conversational turns per datapoint. The maximum number of turns allowed is 5.
54
+ A "turn" is typically one `user → assistant` exchange. The generation supports multiple message roles including **user**, **assistant**, **system**, and **tool**.
55
+ Example: `2` turns produces:
56
+ - user
57
+ - assistant
58
+ - user
59
+ - assistant
60
+
61
+ - **`num_of_datapoints` (int, optional, default: 100)**
62
+ Number of independent dialogues/conversations to generate.
63
+ The maximum number of data points allowed per generation is 1000.
64
+ Example: `10` produces 10 conversations.
65
+
66
+ - **`language` (str, optional, default: "English")**
67
+ The language in which the dataset should be generated.
68
+
69
+ - **`system_prompt` (str, optional)**
70
+ A custom instruction or persona that the assistant should follow throughout the conversations.
71
+
72
+ - **`max_tokens` (int, optional, default: 2048)**
73
+ The maximum token limit for each conversation.
74
+
75
+ - **`use_rolling_temperatures` (bool, optional, default: False)**
76
+ When set to `True`, the generator uses varying temperatures across data points to ensure **higher lexical diversity**.
77
+
78
+ - **`use_model_rotation` (bool, optional, default: False)**
79
+ When set to `True`, the system rotates between different high-quality models to provide **better structural diversity**.
80
+
81
+ - **`output_file` (str, optional, default: "dataset.jsonl")**
82
+ The path and filename where the final dataset will be written.
83
+ Example: `"data.jsonl"`
84
+
85
+ The final dataset is a **JSON Lines (`.jsonl`) file**, where each line represents a conversation formatted as a list of messages. This format is fully compatible for fine-tuning with providers like OpenAI, Mistral, and Anthropic.
86
+
87
+ The SDK supports the following message roles:
88
+ - **`system`**: Sets the context or behavior of the assistant.
89
+ - **`user`**: The human/user prompt.
90
+ - **`assistant`**: The model's response.
91
+ - **`tool`**: Represents tool outputs or function calls (experimental).
92
+
93
+ ### Example Code
94
+
95
+ ```bash
96
+
97
+ import os
98
+ from dotenv import load_dotenv
99
+ from datacreatoraisdk import DataCreatorClient
100
+
101
+ load_dotenv()
102
+
103
+ # Example usage
104
+ if __name__ == "__main__":
105
+ client = DataCreatorClient(api_key=os.getenv("DATACREATOR_API_KEY"))
106
+ client.generate(
107
+ main_theme="Natural dialogues between a user and assistant asking about daily tasks, errands, and emotions.",
108
+ num_of_turns=2,
109
+ num_of_datapoints=100,
110
+ use_rolling_temperatures=True,
111
+ use_model_rotation=False
112
+ )
113
+
114
+ ```
@@ -0,0 +1,9 @@
1
+ README.md
2
+ pyproject.toml
3
+ datacreator_sdk.egg-info/PKG-INFO
4
+ datacreator_sdk.egg-info/SOURCES.txt
5
+ datacreator_sdk.egg-info/dependency_links.txt
6
+ datacreator_sdk.egg-info/requires.txt
7
+ datacreator_sdk.egg-info/top_level.txt
8
+ datacreatoraisdk/__init__.py
9
+ datacreatoraisdk/client.py
@@ -0,0 +1 @@
1
+ requests>=2.32
@@ -0,0 +1 @@
1
+ datacreatoraisdk
@@ -0,0 +1,3 @@
1
+ from .client import DataCreatorClient
2
+
3
+ __all__ = ["DataCreatorClient"]
@@ -0,0 +1,68 @@
1
+ import json
2
+ import time
3
+ import requests
4
+ from pathlib import Path
5
+ import datetime
6
+
7
+ class DataCreatorClient:
8
+ def __init__(self, api_key: str):
9
+ self.api_key = api_key
10
+ self.base_url = "https://synthetic-backend-basic-mmlcpe4eca-uc.a.run.app"
11
+
12
+ def generate(
13
+ self,
14
+ main_theme: str,
15
+ num_of_datapoints: int = 100,
16
+ num_of_turns: int = 3,
17
+ max_tokens: int = 2048,
18
+ language = "English",
19
+ system_prompt: str = "",
20
+ output_file: str = "dataset.jsonl",
21
+ use_rolling_temperatures = False,
22
+ use_model_rotation = False
23
+ ):
24
+ payload = {
25
+ "api_key": self.api_key,
26
+ "main_theme": main_theme,
27
+ "num_of_datapoints": num_of_datapoints,
28
+ "num_of_turns": num_of_turns,
29
+ "max_tokens": max_tokens,
30
+ "language": language,
31
+ "system_prompt": system_prompt,
32
+ "use_rolling_temperatures": use_rolling_temperatures,
33
+ "use_model_rotation": use_model_rotation
34
+ }
35
+
36
+ try:
37
+ steps = [
38
+ "Performing preliminary checks for your main theme...",
39
+ "Generating unique sub-themes for holistic data generation..",
40
+ "Generating your dataset.This may take a few minutes. Please wait..",
41
+ "Preparing your file..."
42
+ ]
43
+
44
+ for step in steps:
45
+ print(step)
46
+ time.sleep(0.2)
47
+
48
+ resp = requests.post(f"{self.base_url}/api/chat-data", json=payload)
49
+ if resp.status_code != 200:
50
+ raise Exception(f"{resp.text}")
51
+
52
+ # Save file
53
+ timestamp = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
54
+ # print(timestamp)
55
+ filename = f"dataset_downloaded_{timestamp}.jsonl" # or read from headers
56
+ with open(filename, "wb") as f:
57
+ f.write(resp.content)
58
+
59
+ print("File downloaded:", filename)
60
+ except Exception as e:
61
+ print(f"{str(e)}")
62
+
63
+ def _download_file(self, url: str, output_file: str):
64
+ with requests.get(url, stream=True) as r:
65
+ r.raise_for_status()
66
+ with open(output_file, "wb") as f:
67
+ for chunk in r.iter_content(chunk_size=8192):
68
+ f.write(chunk)
@@ -0,0 +1,23 @@
1
+ [project]
2
+ name = "datacreator-sdk"
3
+ version = "0.1.5"
4
+ description = "Python SDK for the DataCreator AI dataset generation API"
5
+ authors = [
6
+ { name="DataCreator AI", email="team@datacreatorai.com" }
7
+ ]
8
+ readme = "README.md"
9
+ license = { text = "MIT" }
10
+ requires-python = ">=3.8"
11
+
12
+ dynamic = []
13
+
14
+ dependencies = [
15
+ "requests>=2.32"
16
+ ]
17
+
18
+ [project.urls]
19
+ Homepage = "https://datacreatorai.com"
20
+
21
+ [build-system]
22
+ requires = ["setuptools>=77.0.0", "wheel"]
23
+ build-backend = "setuptools.build_meta"
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+