massgen 0.1.3__py3-none-any.whl → 0.1.4__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of massgen might be problematic. Click here for more details.
- massgen/__init__.py +1 -1
- massgen/api_params_handler/_chat_completions_api_params_handler.py +4 -0
- massgen/api_params_handler/_claude_api_params_handler.py +4 -0
- massgen/api_params_handler/_gemini_api_params_handler.py +4 -0
- massgen/api_params_handler/_response_api_params_handler.py +4 -0
- massgen/backend/base_with_custom_tool_and_mcp.py +25 -5
- massgen/backend/docs/permissions_and_context_files.md +2 -2
- massgen/backend/response.py +2 -0
- massgen/configs/README.md +49 -40
- massgen/configs/tools/custom_tools/crawl4ai_example.yaml +55 -0
- massgen/configs/tools/custom_tools/multimodal_tools/text_to_file_generation_multi.yaml +61 -0
- massgen/configs/tools/custom_tools/multimodal_tools/text_to_file_generation_single.yaml +29 -0
- massgen/configs/tools/custom_tools/multimodal_tools/text_to_image_generation_multi.yaml +51 -0
- massgen/configs/tools/custom_tools/multimodal_tools/text_to_image_generation_single.yaml +33 -0
- massgen/configs/tools/custom_tools/multimodal_tools/text_to_speech_generation_multi.yaml +55 -0
- massgen/configs/tools/custom_tools/multimodal_tools/text_to_speech_generation_single.yaml +33 -0
- massgen/configs/tools/custom_tools/multimodal_tools/text_to_video_generation_multi.yaml +47 -0
- massgen/configs/tools/custom_tools/multimodal_tools/text_to_video_generation_single.yaml +29 -0
- massgen/configs/tools/custom_tools/multimodal_tools/understand_audio.yaml +1 -1
- massgen/configs/tools/custom_tools/multimodal_tools/understand_file.yaml +1 -1
- massgen/configs/tools/custom_tools/multimodal_tools/understand_image.yaml +1 -1
- massgen/configs/tools/custom_tools/multimodal_tools/understand_video.yaml +1 -1
- massgen/configs/tools/custom_tools/multimodal_tools/youtube_video_analysis.yaml +1 -1
- massgen/filesystem_manager/_filesystem_manager.py +1 -0
- massgen/filesystem_manager/_path_permission_manager.py +148 -0
- massgen/message_templates.py +160 -12
- massgen/orchestrator.py +16 -0
- massgen/tests/test_binary_file_blocking.py +274 -0
- massgen/tests/test_case_studies.md +12 -12
- massgen/tests/test_multimodal_size_limits.py +407 -0
- massgen/tool/_manager.py +7 -2
- massgen/tool/_multimodal_tools/image_to_image_generation.py +293 -0
- massgen/tool/_multimodal_tools/text_to_file_generation.py +455 -0
- massgen/tool/_multimodal_tools/text_to_image_generation.py +222 -0
- massgen/tool/_multimodal_tools/text_to_speech_continue_generation.py +226 -0
- massgen/tool/_multimodal_tools/text_to_speech_transcription_generation.py +217 -0
- massgen/tool/_multimodal_tools/text_to_video_generation.py +223 -0
- massgen/tool/_multimodal_tools/understand_audio.py +19 -1
- massgen/tool/_multimodal_tools/understand_file.py +6 -1
- massgen/tool/_multimodal_tools/understand_image.py +112 -8
- massgen/tool/_multimodal_tools/understand_video.py +32 -5
- massgen/tool/_web_tools/crawl4ai_tool.py +718 -0
- massgen/tool/docs/multimodal_tools.md +589 -0
- {massgen-0.1.3.dist-info → massgen-0.1.4.dist-info}/METADATA +96 -69
- {massgen-0.1.3.dist-info → massgen-0.1.4.dist-info}/RECORD +49 -40
- massgen/configs/tools/custom_tools/crawl4ai_mcp_example.yaml +0 -67
- massgen/configs/tools/custom_tools/crawl4ai_multi_agent_example.yaml +0 -68
- massgen/configs/tools/custom_tools/multimodal_tools/playwright_with_img_understanding.yaml +0 -98
- massgen/configs/tools/custom_tools/multimodal_tools/understand_video_example.yaml +0 -54
- massgen/configs/tools/memory/README.md +0 -199
- massgen/configs/tools/memory/gpt5mini_gemini_context_window_management.yaml +0 -131
- massgen/configs/tools/memory/gpt5mini_gemini_no_persistent_memory.yaml +0 -133
- massgen/configs/tools/memory/test_context_window_management.py +0 -286
- massgen/configs/tools/multimodal/gpt5mini_gpt5nano_documentation_evolution.yaml +0 -97
- {massgen-0.1.3.dist-info → massgen-0.1.4.dist-info}/WHEEL +0 -0
- {massgen-0.1.3.dist-info → massgen-0.1.4.dist-info}/entry_points.txt +0 -0
- {massgen-0.1.3.dist-info → massgen-0.1.4.dist-info}/licenses/LICENSE +0 -0
- {massgen-0.1.3.dist-info → massgen-0.1.4.dist-info}/top_level.txt +0 -0
|
@@ -769,6 +769,594 @@ Error: File does not appear to be a video file
|
|
|
769
769
|
|
|
770
770
|
---
|
|
771
771
|
|
|
772
|
+
---
|
|
773
|
+
|
|
774
|
+
## Image Generation Tools
|
|
775
|
+
|
|
776
|
+
### text_to_image_generation
|
|
777
|
+
|
|
778
|
+
**What it does**: Generates images from text descriptions using OpenAI's GPT-4.1 API **WITHOUT ANY INPUT IMAGES**. Creates new images from scratch based solely on text prompts.
|
|
779
|
+
|
|
780
|
+
**Why use it**: Allows agents to create original visual content from descriptions. Perfect for generating illustrations, concept art, product visualizations, or any creative visual content.
|
|
781
|
+
|
|
782
|
+
**Location**: `massgen.tool._multimodal_tools.text_to_image_generation`
|
|
783
|
+
|
|
784
|
+
#### Parameters
|
|
785
|
+
|
|
786
|
+
- `prompt` (required): Text description of the image to generate
|
|
787
|
+
- Be specific and detailed for better results
|
|
788
|
+
- Include style, composition, lighting, and mood details
|
|
789
|
+
- `model` (optional): Model to use (default: "gpt-4.1")
|
|
790
|
+
- Options: "gpt-4.1"
|
|
791
|
+
- `storage_path` (optional): Directory path where to save the image
|
|
792
|
+
- **IMPORTANT**: Must be a DIRECTORY path only, NOT a file path
|
|
793
|
+
- Example: "images/generated" NOT "images/cat.png"
|
|
794
|
+
- Filename is automatically generated from prompt and timestamp
|
|
795
|
+
- Relative path: Resolved relative to agent's workspace
|
|
796
|
+
- Absolute path: Must be within allowed directories
|
|
797
|
+
- None/empty: Saves to agent's workspace root
|
|
798
|
+
- `allowed_paths` (optional): List of allowed base paths for validation
|
|
799
|
+
|
|
800
|
+
#### Returns
|
|
801
|
+
|
|
802
|
+
ExecutionResult containing:
|
|
803
|
+
- `success`: Whether operation succeeded
|
|
804
|
+
- `operation`: "generate_and_store_image_no_input_images"
|
|
805
|
+
- `note`: Note about operation
|
|
806
|
+
- `images`: List of generated images with file paths and metadata
|
|
807
|
+
- `model`: Model used for generation
|
|
808
|
+
- `prompt`: The prompt used for generation
|
|
809
|
+
- `total_images`: Total number of images generated and saved
|
|
810
|
+
|
|
811
|
+
#### Security Features
|
|
812
|
+
|
|
813
|
+
- Requires valid OpenAI API key
|
|
814
|
+
- Files are saved to specified path within workspace
|
|
815
|
+
- Path must be within allowed directories
|
|
816
|
+
- Automatic timestamp-based filename generation
|
|
817
|
+
|
|
818
|
+
#### Examples
|
|
819
|
+
|
|
820
|
+
**Basic Image Generation**:
|
|
821
|
+
|
|
822
|
+
```python
|
|
823
|
+
from massgen.tool._multimodal_tools import text_to_image_generation
|
|
824
|
+
|
|
825
|
+
# Generate an image from a text description
|
|
826
|
+
result = await text_to_image_generation(prompt="a cat in space")
|
|
827
|
+
|
|
828
|
+
# Output includes file path and metadata
|
|
829
|
+
print(result.output_blocks[0].data)
|
|
830
|
+
# {
|
|
831
|
+
# "success": true,
|
|
832
|
+
# "operation": "generate_and_store_image_no_input_images",
|
|
833
|
+
# "images": [{
|
|
834
|
+
# "file_path": "/workspace/20240115_143022_a_cat_in_space.png",
|
|
835
|
+
# "filename": "20240115_143022_a_cat_in_space.png",
|
|
836
|
+
# "size": 125340
|
|
837
|
+
# }],
|
|
838
|
+
# "total_images": 1
|
|
839
|
+
# }
|
|
840
|
+
```
|
|
841
|
+
|
|
842
|
+
**Custom Storage Path**:
|
|
843
|
+
|
|
844
|
+
```python
|
|
845
|
+
# Generate with custom storage location
|
|
846
|
+
result = await text_to_image_generation(
|
|
847
|
+
prompt="sunset over mountains",
|
|
848
|
+
storage_path="art/landscapes"
|
|
849
|
+
)
|
|
850
|
+
```
|
|
851
|
+
|
|
852
|
+
**Configuration Example**:
|
|
853
|
+
|
|
854
|
+
```yaml
|
|
855
|
+
# massgen/configs/tools/custom_tools/multimodal_tools/text_to_image_generation_single.yaml
|
|
856
|
+
agents:
|
|
857
|
+
- id: "image_generator"
|
|
858
|
+
backend:
|
|
859
|
+
type: "openai"
|
|
860
|
+
model: "gpt-4o"
|
|
861
|
+
cwd: "workspace1"
|
|
862
|
+
enable_image_generation: true
|
|
863
|
+
custom_tools:
|
|
864
|
+
- name: ["text_to_image_generation"]
|
|
865
|
+
category: "multimodal"
|
|
866
|
+
path: "massgen/tool/_multimodal_tools/text_to_image_generation.py"
|
|
867
|
+
function: ["text_to_image_generation"]
|
|
868
|
+
```
|
|
869
|
+
|
|
870
|
+
**CLI Usage**:
|
|
871
|
+
|
|
872
|
+
```bash
|
|
873
|
+
massgen --config massgen/configs/tools/custom_tools/multimodal_tools/text_to_image_generation_single.yaml "Generate an image of a futuristic city at night"
|
|
874
|
+
```
|
|
875
|
+
|
|
876
|
+
---
|
|
877
|
+
|
|
878
|
+
### image_to_image_generation
|
|
879
|
+
|
|
880
|
+
**What it does**: Creates variations based on multiple input images using OpenAI's GPT-4.1 API. Generates new images inspired by existing ones.
|
|
881
|
+
|
|
882
|
+
**Why use it**: Allows agents to create variations, mashups, or transformations of existing images. Perfect for style transfer, image editing, or creating variations of designs.
|
|
883
|
+
|
|
884
|
+
**Location**: `massgen.tool._multimodal_tools.image_to_image_generation`
|
|
885
|
+
|
|
886
|
+
#### Parameters
|
|
887
|
+
|
|
888
|
+
- `base_image_paths` (required): List of paths to base images
|
|
889
|
+
- Supported formats: PNG, JPEG (less than 4MB each)
|
|
890
|
+
- Relative paths: Resolved relative to workspace
|
|
891
|
+
- Absolute paths: Must be within allowed directories
|
|
892
|
+
- `prompt` (optional): Text description for the variation (default: "Create a variation of the provided images")
|
|
893
|
+
- `model` (optional): Model to use (default: "gpt-4.1")
|
|
894
|
+
- `storage_path` (optional): Directory path where to save variations
|
|
895
|
+
- **IMPORTANT**: Must be a DIRECTORY path only
|
|
896
|
+
- Filename is automatically generated
|
|
897
|
+
- `allowed_paths` (optional): List of allowed base paths for validation
|
|
898
|
+
|
|
899
|
+
#### Returns
|
|
900
|
+
|
|
901
|
+
ExecutionResult containing:
|
|
902
|
+
- `success`: Whether operation succeeded
|
|
903
|
+
- `operation`: "generate_and_store_image_with_input_images"
|
|
904
|
+
- `note`: Note about usage
|
|
905
|
+
- `images`: List of generated images with file paths and metadata
|
|
906
|
+
- `model`: Model used for generation
|
|
907
|
+
- `prompt`: The prompt used
|
|
908
|
+
- `total_images`: Total number of images generated
|
|
909
|
+
|
|
910
|
+
#### Security Features
|
|
911
|
+
|
|
912
|
+
- Requires valid OpenAI API key
|
|
913
|
+
- Input images must be valid image files less than 4MB
|
|
914
|
+
- Files are saved to specified path within workspace
|
|
915
|
+
- Path validation for security
|
|
916
|
+
|
|
917
|
+
#### Examples
|
|
918
|
+
|
|
919
|
+
**Create Image Variation**:
|
|
920
|
+
|
|
921
|
+
```python
|
|
922
|
+
from massgen.tool._multimodal_tools import image_to_image_generation
|
|
923
|
+
|
|
924
|
+
# Generate variation from a single image
|
|
925
|
+
result = await image_to_image_generation(
|
|
926
|
+
base_image_paths=["logo.png"],
|
|
927
|
+
prompt="Create a modern variation of this logo"
|
|
928
|
+
)
|
|
929
|
+
```
|
|
930
|
+
|
|
931
|
+
**Combine Multiple Images**:
|
|
932
|
+
|
|
933
|
+
```python
|
|
934
|
+
# Generate variation combining multiple images
|
|
935
|
+
result = await image_to_image_generation(
|
|
936
|
+
base_image_paths=["cat.png", "dog.png"],
|
|
937
|
+
prompt="Combine these animals into a single creature"
|
|
938
|
+
)
|
|
939
|
+
```
|
|
940
|
+
|
|
941
|
+
---
|
|
942
|
+
|
|
943
|
+
## Video Generation Tools
|
|
944
|
+
|
|
945
|
+
### text_to_video_generation
|
|
946
|
+
|
|
947
|
+
**What it does**: Generates videos from text descriptions using OpenAI's Sora-2 API. Creates high-quality video content from detailed scene descriptions.
|
|
948
|
+
|
|
949
|
+
**Why use it**: Allows agents to create video content from descriptions. Perfect for marketing content, concept visualization, educational videos, or social media content.
|
|
950
|
+
|
|
951
|
+
**Location**: `massgen.tool._multimodal_tools.text_to_video_generation`
|
|
952
|
+
|
|
953
|
+
#### Parameters
|
|
954
|
+
|
|
955
|
+
- `prompt` (required): Text description for the video to generate
|
|
956
|
+
- Include scene details, camera movements, lighting, atmosphere
|
|
957
|
+
- Be specific about actions, objects, and environment
|
|
958
|
+
- `model` (optional): Model to use (default: "sora-2")
|
|
959
|
+
- `seconds` (optional): Video duration in seconds (default: 4)
|
|
960
|
+
- Supported range: 4-20 seconds
|
|
961
|
+
- `storage_path` (optional): Directory path where to save the video
|
|
962
|
+
- **IMPORTANT**: Must be a DIRECTORY path only
|
|
963
|
+
- Filename is automatically generated from prompt and timestamp
|
|
964
|
+
- Relative path: Resolved relative to workspace
|
|
965
|
+
- Absolute path: Must be within allowed directories
|
|
966
|
+
- `allowed_paths` (optional): List of allowed base paths for validation
|
|
967
|
+
|
|
968
|
+
#### Returns
|
|
969
|
+
|
|
970
|
+
ExecutionResult containing:
|
|
971
|
+
- `success`: Whether operation succeeded
|
|
972
|
+
- `operation`: "generate_and_store_video_no_input_images"
|
|
973
|
+
- `video_path`: Path to the saved video file
|
|
974
|
+
- `filename`: Name of the generated file
|
|
975
|
+
- `size`: File size in bytes
|
|
976
|
+
- `model`: Model used for generation
|
|
977
|
+
- `prompt`: The prompt used
|
|
978
|
+
- `duration`: Time taken for generation in seconds
|
|
979
|
+
|
|
980
|
+
#### Security Features
|
|
981
|
+
|
|
982
|
+
- Requires valid OpenAI API key with Sora-2 access
|
|
983
|
+
- Files are saved to specified path within workspace
|
|
984
|
+
- Automatic video download and storage
|
|
985
|
+
|
|
986
|
+
#### Examples
|
|
987
|
+
|
|
988
|
+
**Basic Video Generation**:
|
|
989
|
+
|
|
990
|
+
```python
|
|
991
|
+
from massgen.tool._multimodal_tools import text_to_video_generation
|
|
992
|
+
|
|
993
|
+
# Generate a 4-second video
|
|
994
|
+
result = await text_to_video_generation(
|
|
995
|
+
prompt="A cool cat on a motorcycle in the night"
|
|
996
|
+
)
|
|
997
|
+
|
|
998
|
+
# Output includes video path
|
|
999
|
+
print(result.output_blocks[0].data)
|
|
1000
|
+
# {
|
|
1001
|
+
# "success": true,
|
|
1002
|
+
# "operation": "generate_and_store_video_no_input_images",
|
|
1003
|
+
# "video_path": "/workspace/20240115_143022_a_cool_cat_on_motorcycle.mp4",
|
|
1004
|
+
# "size": 5242880,
|
|
1005
|
+
# "duration": 45.2
|
|
1006
|
+
# }
|
|
1007
|
+
```
|
|
1008
|
+
|
|
1009
|
+
**Detailed Scene with Duration**:
|
|
1010
|
+
|
|
1011
|
+
```python
|
|
1012
|
+
# Generate with detailed prompt and custom duration
|
|
1013
|
+
result = await text_to_video_generation(
|
|
1014
|
+
prompt="Neon-lit alley at night, light rain, slow push-in, cinematic lighting",
|
|
1015
|
+
seconds=10,
|
|
1016
|
+
storage_path="videos/cinematic"
|
|
1017
|
+
)
|
|
1018
|
+
```
|
|
1019
|
+
|
|
1020
|
+
**Configuration Example**:
|
|
1021
|
+
|
|
1022
|
+
```yaml
|
|
1023
|
+
# massgen/configs/tools/custom_tools/multimodal_tools/text_to_video_generation_single.yaml
|
|
1024
|
+
agents:
|
|
1025
|
+
- id: "video_generator"
|
|
1026
|
+
backend:
|
|
1027
|
+
type: "openai"
|
|
1028
|
+
model: "gpt-4o"
|
|
1029
|
+
cwd: "workspace1"
|
|
1030
|
+
enable_video_generation: true
|
|
1031
|
+
custom_tools:
|
|
1032
|
+
- name: ["text_to_video_generation"]
|
|
1033
|
+
category: "multimodal"
|
|
1034
|
+
path: "massgen/tool/_multimodal_tools/text_to_video_generation.py"
|
|
1035
|
+
function: ["text_to_video_generation"]
|
|
1036
|
+
```
|
|
1037
|
+
|
|
1038
|
+
**CLI Usage**:
|
|
1039
|
+
|
|
1040
|
+
```bash
|
|
1041
|
+
massgen --config massgen/configs/tools/custom_tools/multimodal_tools/text_to_video_generation_single.yaml "Generate a 4 seconds video with neon-lit alley at night, light rain, slow push-in, cinematic."
|
|
1042
|
+
```
|
|
1043
|
+
|
|
1044
|
+
---
|
|
1045
|
+
|
|
1046
|
+
## Audio/Speech Generation Tools
|
|
1047
|
+
|
|
1048
|
+
### text_to_speech_continue_generation
|
|
1049
|
+
|
|
1050
|
+
**What it does**: Generates expressive speech from text using OpenAI's GPT-4o Audio Preview model. Creates natural-sounding speech with emotional expression and context awareness.
|
|
1051
|
+
|
|
1052
|
+
**Why use it**: Allows agents to generate expressive speech with emotional tone. Perfect for creating voice-overs, narrations, audiobooks, or any content requiring natural, emotional speech.
|
|
1053
|
+
|
|
1054
|
+
**Location**: `massgen.tool._multimodal_tools.text_to_speech_continue_generation`
|
|
1055
|
+
|
|
1056
|
+
#### Parameters
|
|
1057
|
+
|
|
1058
|
+
- `prompt` (required): Text content to convert to audio speech
|
|
1059
|
+
- `model` (optional): Model to use (default: "gpt-4o-audio-preview")
|
|
1060
|
+
- `voice` (optional): Voice to use (default: "alloy")
|
|
1061
|
+
- Options: "alloy", "echo", "fable", "onyx", "nova", "shimmer"
|
|
1062
|
+
- `audio_format` (optional): Audio format for output (default: "wav")
|
|
1063
|
+
- Options: "wav", "mp3", "opus", "aac", "flac"
|
|
1064
|
+
- `storage_path` (optional): Directory path where to save the audio
|
|
1065
|
+
- **IMPORTANT**: Must be a DIRECTORY path only
|
|
1066
|
+
- Filename is automatically generated from prompt and timestamp
|
|
1067
|
+
- `allowed_paths` (optional): List of allowed base paths for validation
|
|
1068
|
+
|
|
1069
|
+
#### Returns
|
|
1070
|
+
|
|
1071
|
+
ExecutionResult containing:
|
|
1072
|
+
- `success`: Whether operation succeeded
|
|
1073
|
+
- `operation`: "generate_and_store_audio_no_input_audios"
|
|
1074
|
+
- `audio_file`: Generated audio file with path and metadata
|
|
1075
|
+
- `model`: Model used for generation
|
|
1076
|
+
- `prompt`: The prompt used for generation
|
|
1077
|
+
- `voice`: Voice used for generation
|
|
1078
|
+
- `format`: Audio format used
|
|
1079
|
+
|
|
1080
|
+
#### Examples
|
|
1081
|
+
|
|
1082
|
+
**Expressive Speech**:
|
|
1083
|
+
|
|
1084
|
+
```python
|
|
1085
|
+
from massgen.tool._multimodal_tools import text_to_speech_continue_generation
|
|
1086
|
+
|
|
1087
|
+
# Generate expressive speech
|
|
1088
|
+
result = await text_to_speech_continue_generation(
|
|
1089
|
+
prompt="I want you to tell me a very short introduction about Sherlock Holmes in one sentence, and I want you to use emotion voice to read it out loud."
|
|
1090
|
+
)
|
|
1091
|
+
```
|
|
1092
|
+
|
|
1093
|
+
**Custom Voice and Format**:
|
|
1094
|
+
|
|
1095
|
+
```python
|
|
1096
|
+
# Generate with specific voice and format
|
|
1097
|
+
result = await text_to_speech_continue_generation(
|
|
1098
|
+
prompt="Hello world",
|
|
1099
|
+
voice="nova",
|
|
1100
|
+
audio_format="mp3",
|
|
1101
|
+
storage_path="audio/generated"
|
|
1102
|
+
)
|
|
1103
|
+
```
|
|
1104
|
+
|
|
1105
|
+
---
|
|
1106
|
+
|
|
1107
|
+
### text_to_speech_transcription_generation
|
|
1108
|
+
|
|
1109
|
+
**What it does**: Converts text directly to speech using OpenAI's TTS API with streaming response. Provides fast, cost-effective text-to-speech conversion.
|
|
1110
|
+
|
|
1111
|
+
**Why use it**: Allows agents to quickly convert text to speech. Perfect for transcription conversion, simple voice-overs, or when expressive emotion is not required.
|
|
1112
|
+
|
|
1113
|
+
**Location**: `massgen.tool._multimodal_tools.text_to_speech_transcription_generation`
|
|
1114
|
+
|
|
1115
|
+
#### Parameters
|
|
1116
|
+
|
|
1117
|
+
- `input_text` (required): The text content to convert to speech
|
|
1118
|
+
- `model` (optional): TTS model to use (default: "gpt-4o-mini-tts")
|
|
1119
|
+
- Options: "gpt-4o-mini-tts", "tts-1", "tts-1-hd"
|
|
1120
|
+
- `voice` (optional): Voice to use (default: "alloy")
|
|
1121
|
+
- Options: "alloy", "echo", "fable", "onyx", "nova", "shimmer", "coral", "sage"
|
|
1122
|
+
- `instructions` (optional): Optional speaking instructions for tone and style
|
|
1123
|
+
- Example: "Speak in a cheerful tone"
|
|
1124
|
+
- `storage_path` (optional): Directory path where to save the audio file
|
|
1125
|
+
- **IMPORTANT**: Must be a DIRECTORY path only
|
|
1126
|
+
- `audio_format` (optional): Output audio format (default: "mp3")
|
|
1127
|
+
- Options: "mp3", "opus", "aac", "flac", "wav", "pcm"
|
|
1128
|
+
- `allowed_paths` (optional): List of allowed base paths for validation
|
|
1129
|
+
|
|
1130
|
+
#### Returns
|
|
1131
|
+
|
|
1132
|
+
ExecutionResult containing:
|
|
1133
|
+
- `success`: Whether operation succeeded
|
|
1134
|
+
- `operation`: "convert_text_to_speech"
|
|
1135
|
+
- `audio_file`: Generated audio file with path and metadata
|
|
1136
|
+
- `model`: TTS model used
|
|
1137
|
+
- `voice`: Voice used
|
|
1138
|
+
- `format`: Audio format used
|
|
1139
|
+
- `text_length`: Length of input text
|
|
1140
|
+
- `instructions`: Speaking instructions if provided
|
|
1141
|
+
|
|
1142
|
+
#### Examples
|
|
1143
|
+
|
|
1144
|
+
**Simple Text-to-Speech**:
|
|
1145
|
+
|
|
1146
|
+
```python
|
|
1147
|
+
from massgen.tool._multimodal_tools import text_to_speech_transcription_generation
|
|
1148
|
+
|
|
1149
|
+
# Convert text to speech
|
|
1150
|
+
result = await text_to_speech_transcription_generation(
|
|
1151
|
+
input_text="Hello world, this is a test."
|
|
1152
|
+
)
|
|
1153
|
+
```
|
|
1154
|
+
|
|
1155
|
+
**With Instructions**:
|
|
1156
|
+
|
|
1157
|
+
```python
|
|
1158
|
+
# Convert with specific voice and instructions
|
|
1159
|
+
result = await text_to_speech_transcription_generation(
|
|
1160
|
+
input_text="Today is a wonderful day to build something people love!",
|
|
1161
|
+
voice="coral",
|
|
1162
|
+
instructions="Speak in a cheerful and positive tone."
|
|
1163
|
+
)
|
|
1164
|
+
```
|
|
1165
|
+
|
|
1166
|
+
---
|
|
1167
|
+
|
|
1168
|
+
## File Generation Tools
|
|
1169
|
+
|
|
1170
|
+
### text_to_file_generation
|
|
1171
|
+
|
|
1172
|
+
**What it does**: Generates text content using OpenAI API and saves it as various file formats (TXT, MD, PDF, PPTX). Creates professional documents from text prompts.
|
|
1173
|
+
|
|
1174
|
+
**Why use it**: Allows agents to create formatted documents automatically. Perfect for generating reports, documentation, presentations, or any structured text content.
|
|
1175
|
+
|
|
1176
|
+
**Location**: `massgen.tool._multimodal_tools.text_to_file_generation`
|
|
1177
|
+
|
|
1178
|
+
#### Parameters
|
|
1179
|
+
|
|
1180
|
+
- `prompt` (required): Description of the content to generate
|
|
1181
|
+
- Be specific about structure, sections, and formatting
|
|
1182
|
+
- Example: "Write a technical report about AI"
|
|
1183
|
+
- `file_format` (optional): Output file format (default: "txt")
|
|
1184
|
+
- Options: "txt", "md", "pdf", "pptx"
|
|
1185
|
+
- `filename` (optional): Custom filename without extension
|
|
1186
|
+
- If not provided, generates from prompt and timestamp
|
|
1187
|
+
- `model` (optional): OpenAI model to use (default: "gpt-4o")
|
|
1188
|
+
- Options: "gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "gpt-3.5-turbo"
|
|
1189
|
+
- `storage_path` (optional): Directory path where to save the file
|
|
1190
|
+
- **IMPORTANT**: Must be a DIRECTORY path only
|
|
1191
|
+
- Filename is automatically generated from prompt or custom filename
|
|
1192
|
+
- Relative path: Resolved relative to workspace
|
|
1193
|
+
- Absolute path: Must be within allowed directories
|
|
1194
|
+
- `allowed_paths` (optional): List of allowed base paths for validation
|
|
1195
|
+
|
|
1196
|
+
#### Returns
|
|
1197
|
+
|
|
1198
|
+
ExecutionResult containing:
|
|
1199
|
+
- `success`: Whether operation succeeded
|
|
1200
|
+
- `operation`: "generate_and_store_file"
|
|
1201
|
+
- `file_path`: Path to the generated file
|
|
1202
|
+
- `filename`: Name of the generated file
|
|
1203
|
+
- `file_format`: Format of the generated file
|
|
1204
|
+
- `content_preview`: First 500 characters of generated content
|
|
1205
|
+
- `file_size`: Size of the generated file in bytes
|
|
1206
|
+
- `model`: Model used for generation
|
|
1207
|
+
- `prompt`: The prompt used
|
|
1208
|
+
|
|
1209
|
+
#### Security Features
|
|
1210
|
+
|
|
1211
|
+
- Requires valid OpenAI API key
|
|
1212
|
+
- Files are saved to specified path within workspace
|
|
1213
|
+
- Path must be within allowed directories
|
|
1214
|
+
|
|
1215
|
+
#### Dependencies
|
|
1216
|
+
|
|
1217
|
+
- PDF generation requires either `reportlab` or `fpdf2` library
|
|
1218
|
+
- PPTX generation requires `python-pptx` library
|
|
1219
|
+
|
|
1220
|
+
```bash
|
|
1221
|
+
pip install reportlab # For PDF
|
|
1222
|
+
pip install python-pptx # For PPTX
|
|
1223
|
+
```
|
|
1224
|
+
|
|
1225
|
+
#### Examples
|
|
1226
|
+
|
|
1227
|
+
**Generate Markdown Document**:
|
|
1228
|
+
|
|
1229
|
+
```python
|
|
1230
|
+
from massgen.tool._multimodal_tools import text_to_file_generation
|
|
1231
|
+
|
|
1232
|
+
# Generate a markdown file
|
|
1233
|
+
result = await text_to_file_generation(
|
|
1234
|
+
prompt="Write a blog post about Python",
|
|
1235
|
+
file_format="md"
|
|
1236
|
+
)
|
|
1237
|
+
```
|
|
1238
|
+
|
|
1239
|
+
**Generate PDF Report**:
|
|
1240
|
+
|
|
1241
|
+
```python
|
|
1242
|
+
# Generate a PDF with custom filename
|
|
1243
|
+
result = await text_to_file_generation(
|
|
1244
|
+
prompt="Create a technical report on machine learning",
|
|
1245
|
+
file_format="pdf",
|
|
1246
|
+
filename="ml_report",
|
|
1247
|
+
storage_path="documents/reports"
|
|
1248
|
+
)
|
|
1249
|
+
```
|
|
1250
|
+
|
|
1251
|
+
**Generate PowerPoint Presentation**:
|
|
1252
|
+
|
|
1253
|
+
```python
|
|
1254
|
+
# Generate PPTX - structure prompt with slide titles (# or ##) and bullet points (-)
|
|
1255
|
+
result = await text_to_file_generation(
|
|
1256
|
+
prompt="""Create a presentation about AI trends:
|
|
1257
|
+
|
|
1258
|
+
# Introduction
|
|
1259
|
+
- Overview of AI landscape
|
|
1260
|
+
- Key developments in 2024
|
|
1261
|
+
|
|
1262
|
+
# Current Trends
|
|
1263
|
+
- Large Language Models
|
|
1264
|
+
- Multimodal AI
|
|
1265
|
+
- AI Safety
|
|
1266
|
+
|
|
1267
|
+
# Future Outlook
|
|
1268
|
+
- Predictions for 2025
|
|
1269
|
+
- Emerging technologies
|
|
1270
|
+
""",
|
|
1271
|
+
file_format="pptx",
|
|
1272
|
+
filename="ai_trends_presentation"
|
|
1273
|
+
)
|
|
1274
|
+
```
|
|
1275
|
+
|
|
1276
|
+
**Configuration Example**:
|
|
1277
|
+
|
|
1278
|
+
```yaml
|
|
1279
|
+
# massgen/configs/tools/custom_tools/multimodal_tools/text_to_file_generation_single.yaml
|
|
1280
|
+
agents:
|
|
1281
|
+
- id: "document_generator"
|
|
1282
|
+
backend:
|
|
1283
|
+
type: "openai"
|
|
1284
|
+
model: "gpt-4o"
|
|
1285
|
+
cwd: "workspace1"
|
|
1286
|
+
enable_file_generation: true
|
|
1287
|
+
custom_tools:
|
|
1288
|
+
- name: ["text_to_file_generation"]
|
|
1289
|
+
category: "multimodal"
|
|
1290
|
+
path: "massgen/tool/_multimodal_tools/text_to_file_generation.py"
|
|
1291
|
+
function: ["text_to_file_generation"]
|
|
1292
|
+
```
|
|
1293
|
+
|
|
1294
|
+
**CLI Usage**:
|
|
1295
|
+
|
|
1296
|
+
```bash
|
|
1297
|
+
massgen --config massgen/configs/tools/custom_tools/multimodal_tools/text_to_file_generation_single.yaml "Generate a technical report about LLMs and save as PDF"
|
|
1298
|
+
```
|
|
1299
|
+
|
|
1300
|
+
#### Note
|
|
1301
|
+
|
|
1302
|
+
- For PPTX format, structure your prompt to include slide titles (using # or ##) and bullet points (using -)
|
|
1303
|
+
- The quality and format of generated content depends on the prompt
|
|
1304
|
+
- Longer content may consume more tokens
|
|
1305
|
+
|
|
1306
|
+
---
|
|
1307
|
+
|
|
1308
|
+
## Best Practices for Generation Tools
|
|
1309
|
+
|
|
1310
|
+
### Image Generation
|
|
1311
|
+
|
|
1312
|
+
1. **Prompt Quality**:
|
|
1313
|
+
- Be specific about style, composition, lighting, and mood
|
|
1314
|
+
- Include details about colors, perspective, and atmosphere
|
|
1315
|
+
- Use artistic terminology for better results
|
|
1316
|
+
|
|
1317
|
+
2. **Cost Management**:
|
|
1318
|
+
- Image generation (GPT-4.1) is more expensive than standard API calls
|
|
1319
|
+
- Test prompts with understanding tools first
|
|
1320
|
+
- Use multi-agent workflows to refine prompts before generation
|
|
1321
|
+
|
|
1322
|
+
### Video Generation
|
|
1323
|
+
|
|
1324
|
+
1. **Prompt Structure**:
|
|
1325
|
+
- Include: setting, lighting, camera movements, atmosphere
|
|
1326
|
+
- Specify duration based on content complexity (4-20 seconds)
|
|
1327
|
+
- Use cinematic terminology (push-in, pull-out, pan, etc.)
|
|
1328
|
+
|
|
1329
|
+
2. **Quality Verification**:
|
|
1330
|
+
- Combine with `understand_video` tool for quality checks
|
|
1331
|
+
- Use multi-agent workflows for iterative refinement
|
|
1332
|
+
|
|
1333
|
+
### Audio/Speech Generation
|
|
1334
|
+
|
|
1335
|
+
1. **Voice Selection**:
|
|
1336
|
+
- Choose appropriate voice for content type
|
|
1337
|
+
- Use expressive model (gpt-4o-audio-preview) for emotional content
|
|
1338
|
+
- Use TTS model (gpt-4o-mini-tts) for simple conversions
|
|
1339
|
+
|
|
1340
|
+
2. **Format Selection**:
|
|
1341
|
+
- WAV for highest quality
|
|
1342
|
+
- MP3 for balanced quality/size
|
|
1343
|
+
- OPUS for web streaming
|
|
1344
|
+
|
|
1345
|
+
### Document Generation
|
|
1346
|
+
|
|
1347
|
+
1. **Format Selection**:
|
|
1348
|
+
- TXT for simple content
|
|
1349
|
+
- MD for formatted documentation
|
|
1350
|
+
- PDF for professional documents
|
|
1351
|
+
- PPTX for presentations
|
|
1352
|
+
|
|
1353
|
+
2. **Prompt Structure**:
|
|
1354
|
+
- Outline structure clearly
|
|
1355
|
+
- Specify sections and formatting
|
|
1356
|
+
- For PPTX, use markdown-style headers and bullets
|
|
1357
|
+
|
|
1358
|
+
---
|
|
1359
|
+
|
|
772
1360
|
## Additional Resources
|
|
773
1361
|
|
|
774
1362
|
- [OpenAI API Documentation](https://platform.openai.com/docs)
|
|
@@ -777,3 +1365,4 @@ Error: File does not appear to be a video file
|
|
|
777
1365
|
- [python-docx Documentation](https://python-docx.readthedocs.io/)
|
|
778
1366
|
- [openpyxl Documentation](https://openpyxl.readthedocs.io/)
|
|
779
1367
|
- [python-pptx Documentation](https://python-pptx.readthedocs.io/)
|
|
1368
|
+
- [reportlab Documentation](https://www.reportlab.com/documentation/)
|