massgen 0.1.3__py3-none-any.whl → 0.1.4__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of massgen might be problematic. Click here for more details.

Files changed (58) hide show
  1. massgen/__init__.py +1 -1
  2. massgen/api_params_handler/_chat_completions_api_params_handler.py +4 -0
  3. massgen/api_params_handler/_claude_api_params_handler.py +4 -0
  4. massgen/api_params_handler/_gemini_api_params_handler.py +4 -0
  5. massgen/api_params_handler/_response_api_params_handler.py +4 -0
  6. massgen/backend/base_with_custom_tool_and_mcp.py +25 -5
  7. massgen/backend/docs/permissions_and_context_files.md +2 -2
  8. massgen/backend/response.py +2 -0
  9. massgen/configs/README.md +49 -40
  10. massgen/configs/tools/custom_tools/crawl4ai_example.yaml +55 -0
  11. massgen/configs/tools/custom_tools/multimodal_tools/text_to_file_generation_multi.yaml +61 -0
  12. massgen/configs/tools/custom_tools/multimodal_tools/text_to_file_generation_single.yaml +29 -0
  13. massgen/configs/tools/custom_tools/multimodal_tools/text_to_image_generation_multi.yaml +51 -0
  14. massgen/configs/tools/custom_tools/multimodal_tools/text_to_image_generation_single.yaml +33 -0
  15. massgen/configs/tools/custom_tools/multimodal_tools/text_to_speech_generation_multi.yaml +55 -0
  16. massgen/configs/tools/custom_tools/multimodal_tools/text_to_speech_generation_single.yaml +33 -0
  17. massgen/configs/tools/custom_tools/multimodal_tools/text_to_video_generation_multi.yaml +47 -0
  18. massgen/configs/tools/custom_tools/multimodal_tools/text_to_video_generation_single.yaml +29 -0
  19. massgen/configs/tools/custom_tools/multimodal_tools/understand_audio.yaml +1 -1
  20. massgen/configs/tools/custom_tools/multimodal_tools/understand_file.yaml +1 -1
  21. massgen/configs/tools/custom_tools/multimodal_tools/understand_image.yaml +1 -1
  22. massgen/configs/tools/custom_tools/multimodal_tools/understand_video.yaml +1 -1
  23. massgen/configs/tools/custom_tools/multimodal_tools/youtube_video_analysis.yaml +1 -1
  24. massgen/filesystem_manager/_filesystem_manager.py +1 -0
  25. massgen/filesystem_manager/_path_permission_manager.py +148 -0
  26. massgen/message_templates.py +160 -12
  27. massgen/orchestrator.py +16 -0
  28. massgen/tests/test_binary_file_blocking.py +274 -0
  29. massgen/tests/test_case_studies.md +12 -12
  30. massgen/tests/test_multimodal_size_limits.py +407 -0
  31. massgen/tool/_manager.py +7 -2
  32. massgen/tool/_multimodal_tools/image_to_image_generation.py +293 -0
  33. massgen/tool/_multimodal_tools/text_to_file_generation.py +455 -0
  34. massgen/tool/_multimodal_tools/text_to_image_generation.py +222 -0
  35. massgen/tool/_multimodal_tools/text_to_speech_continue_generation.py +226 -0
  36. massgen/tool/_multimodal_tools/text_to_speech_transcription_generation.py +217 -0
  37. massgen/tool/_multimodal_tools/text_to_video_generation.py +223 -0
  38. massgen/tool/_multimodal_tools/understand_audio.py +19 -1
  39. massgen/tool/_multimodal_tools/understand_file.py +6 -1
  40. massgen/tool/_multimodal_tools/understand_image.py +112 -8
  41. massgen/tool/_multimodal_tools/understand_video.py +32 -5
  42. massgen/tool/_web_tools/crawl4ai_tool.py +718 -0
  43. massgen/tool/docs/multimodal_tools.md +589 -0
  44. {massgen-0.1.3.dist-info → massgen-0.1.4.dist-info}/METADATA +96 -69
  45. {massgen-0.1.3.dist-info → massgen-0.1.4.dist-info}/RECORD +49 -40
  46. massgen/configs/tools/custom_tools/crawl4ai_mcp_example.yaml +0 -67
  47. massgen/configs/tools/custom_tools/crawl4ai_multi_agent_example.yaml +0 -68
  48. massgen/configs/tools/custom_tools/multimodal_tools/playwright_with_img_understanding.yaml +0 -98
  49. massgen/configs/tools/custom_tools/multimodal_tools/understand_video_example.yaml +0 -54
  50. massgen/configs/tools/memory/README.md +0 -199
  51. massgen/configs/tools/memory/gpt5mini_gemini_context_window_management.yaml +0 -131
  52. massgen/configs/tools/memory/gpt5mini_gemini_no_persistent_memory.yaml +0 -133
  53. massgen/configs/tools/memory/test_context_window_management.py +0 -286
  54. massgen/configs/tools/multimodal/gpt5mini_gpt5nano_documentation_evolution.yaml +0 -97
  55. {massgen-0.1.3.dist-info → massgen-0.1.4.dist-info}/WHEEL +0 -0
  56. {massgen-0.1.3.dist-info → massgen-0.1.4.dist-info}/entry_points.txt +0 -0
  57. {massgen-0.1.3.dist-info → massgen-0.1.4.dist-info}/licenses/LICENSE +0 -0
  58. {massgen-0.1.3.dist-info → massgen-0.1.4.dist-info}/top_level.txt +0 -0
@@ -769,6 +769,594 @@ Error: File does not appear to be a video file
769
769
 
770
770
  ---
771
771
 
772
+ ---
773
+
774
+ ## Image Generation Tools
775
+
776
+ ### text_to_image_generation
777
+
778
+ **What it does**: Generates images from text descriptions using OpenAI's GPT-4.1 API **WITHOUT ANY INPUT IMAGES**. Creates new images from scratch based solely on text prompts.
779
+
780
+ **Why use it**: Allows agents to create original visual content from descriptions. Perfect for generating illustrations, concept art, product visualizations, or any creative visual content.
781
+
782
+ **Location**: `massgen.tool._multimodal_tools.text_to_image_generation`
783
+
784
+ #### Parameters
785
+
786
+ - `prompt` (required): Text description of the image to generate
787
+ - Be specific and detailed for better results
788
+ - Include style, composition, lighting, and mood details
789
+ - `model` (optional): Model to use (default: "gpt-4.1")
790
+ - Options: "gpt-4.1"
791
+ - `storage_path` (optional): Directory path where to save the image
792
+ - **IMPORTANT**: Must be a DIRECTORY path only, NOT a file path
793
+ - Example: "images/generated" NOT "images/cat.png"
794
+ - Filename is automatically generated from prompt and timestamp
795
+ - Relative path: Resolved relative to agent's workspace
796
+ - Absolute path: Must be within allowed directories
797
+ - None/empty: Saves to agent's workspace root
798
+ - `allowed_paths` (optional): List of allowed base paths for validation
799
+
800
+ #### Returns
801
+
802
+ ExecutionResult containing:
803
+ - `success`: Whether operation succeeded
804
+ - `operation`: "generate_and_store_image_no_input_images"
805
+ - `note`: Note about operation
806
+ - `images`: List of generated images with file paths and metadata
807
+ - `model`: Model used for generation
808
+ - `prompt`: The prompt used for generation
809
+ - `total_images`: Total number of images generated and saved
810
+
811
+ #### Security Features
812
+
813
+ - Requires valid OpenAI API key
814
+ - Files are saved to specified path within workspace
815
+ - Path must be within allowed directories
816
+ - Automatic timestamp-based filename generation
817
+
818
+ #### Examples
819
+
820
+ **Basic Image Generation**:
821
+
822
+ ```python
823
+ from massgen.tool._multimodal_tools import text_to_image_generation
824
+
825
+ # Generate an image from a text description
826
+ result = await text_to_image_generation(prompt="a cat in space")
827
+
828
+ # Output includes file path and metadata
829
+ print(result.output_blocks[0].data)
830
+ # {
831
+ # "success": true,
832
+ # "operation": "generate_and_store_image_no_input_images",
833
+ # "images": [{
834
+ # "file_path": "/workspace/20240115_143022_a_cat_in_space.png",
835
+ # "filename": "20240115_143022_a_cat_in_space.png",
836
+ # "size": 125340
837
+ # }],
838
+ # "total_images": 1
839
+ # }
840
+ ```
841
+
842
+ **Custom Storage Path**:
843
+
844
+ ```python
845
+ # Generate with custom storage location
846
+ result = await text_to_image_generation(
847
+ prompt="sunset over mountains",
848
+ storage_path="art/landscapes"
849
+ )
850
+ ```
851
+
852
+ **Configuration Example**:
853
+
854
+ ```yaml
855
+ # massgen/configs/tools/custom_tools/multimodal_tools/text_to_image_generation_single.yaml
856
+ agents:
857
+ - id: "image_generator"
858
+ backend:
859
+ type: "openai"
860
+ model: "gpt-4o"
861
+ cwd: "workspace1"
862
+ enable_image_generation: true
863
+ custom_tools:
864
+ - name: ["text_to_image_generation"]
865
+ category: "multimodal"
866
+ path: "massgen/tool/_multimodal_tools/text_to_image_generation.py"
867
+ function: ["text_to_image_generation"]
868
+ ```
869
+
870
+ **CLI Usage**:
871
+
872
+ ```bash
873
+ massgen --config massgen/configs/tools/custom_tools/multimodal_tools/text_to_image_generation_single.yaml "Generate an image of a futuristic city at night"
874
+ ```
875
+
876
+ ---
877
+
878
+ ### image_to_image_generation
879
+
880
+ **What it does**: Creates variations based on multiple input images using OpenAI's GPT-4.1 API. Generates new images inspired by existing ones.
881
+
882
+ **Why use it**: Allows agents to create variations, mashups, or transformations of existing images. Perfect for style transfer, image editing, or creating variations of designs.
883
+
884
+ **Location**: `massgen.tool._multimodal_tools.image_to_image_generation`
885
+
886
+ #### Parameters
887
+
888
+ - `base_image_paths` (required): List of paths to base images
889
+ - Supported formats: PNG, JPEG (less than 4MB each)
890
+ - Relative paths: Resolved relative to workspace
891
+ - Absolute paths: Must be within allowed directories
892
+ - `prompt` (optional): Text description for the variation (default: "Create a variation of the provided images")
893
+ - `model` (optional): Model to use (default: "gpt-4.1")
894
+ - `storage_path` (optional): Directory path where to save variations
895
+ - **IMPORTANT**: Must be a DIRECTORY path only
896
+ - Filename is automatically generated
897
+ - `allowed_paths` (optional): List of allowed base paths for validation
898
+
899
+ #### Returns
900
+
901
+ ExecutionResult containing:
902
+ - `success`: Whether operation succeeded
903
+ - `operation`: "generate_and_store_image_with_input_images"
904
+ - `note`: Note about usage
905
+ - `images`: List of generated images with file paths and metadata
906
+ - `model`: Model used for generation
907
+ - `prompt`: The prompt used
908
+ - `total_images`: Total number of images generated
909
+
910
+ #### Security Features
911
+
912
+ - Requires valid OpenAI API key
913
+ - Input images must be valid image files less than 4MB
914
+ - Files are saved to specified path within workspace
915
+ - Path validation for security
916
+
917
+ #### Examples
918
+
919
+ **Create Image Variation**:
920
+
921
+ ```python
922
+ from massgen.tool._multimodal_tools import image_to_image_generation
923
+
924
+ # Generate variation from a single image
925
+ result = await image_to_image_generation(
926
+ base_image_paths=["logo.png"],
927
+ prompt="Create a modern variation of this logo"
928
+ )
929
+ ```
930
+
931
+ **Combine Multiple Images**:
932
+
933
+ ```python
934
+ # Generate variation combining multiple images
935
+ result = await image_to_image_generation(
936
+ base_image_paths=["cat.png", "dog.png"],
937
+ prompt="Combine these animals into a single creature"
938
+ )
939
+ ```
940
+
941
+ ---
942
+
943
+ ## Video Generation Tools
944
+
945
+ ### text_to_video_generation
946
+
947
+ **What it does**: Generates videos from text descriptions using OpenAI's Sora-2 API. Creates high-quality video content from detailed scene descriptions.
948
+
949
+ **Why use it**: Allows agents to create video content from descriptions. Perfect for marketing content, concept visualization, educational videos, or social media content.
950
+
951
+ **Location**: `massgen.tool._multimodal_tools.text_to_video_generation`
952
+
953
+ #### Parameters
954
+
955
+ - `prompt` (required): Text description for the video to generate
956
+ - Include scene details, camera movements, lighting, atmosphere
957
+ - Be specific about actions, objects, and environment
958
+ - `model` (optional): Model to use (default: "sora-2")
959
+ - `seconds` (optional): Video duration in seconds (default: 4)
960
+ - Supported range: 4-20 seconds
961
+ - `storage_path` (optional): Directory path where to save the video
962
+ - **IMPORTANT**: Must be a DIRECTORY path only
963
+ - Filename is automatically generated from prompt and timestamp
964
+ - Relative path: Resolved relative to workspace
965
+ - Absolute path: Must be within allowed directories
966
+ - `allowed_paths` (optional): List of allowed base paths for validation
967
+
968
+ #### Returns
969
+
970
+ ExecutionResult containing:
971
+ - `success`: Whether operation succeeded
972
+ - `operation`: "generate_and_store_video_no_input_images"
973
+ - `video_path`: Path to the saved video file
974
+ - `filename`: Name of the generated file
975
+ - `size`: File size in bytes
976
+ - `model`: Model used for generation
977
+ - `prompt`: The prompt used
978
+ - `duration`: Time taken for generation in seconds
979
+
980
+ #### Security Features
981
+
982
+ - Requires valid OpenAI API key with Sora-2 access
983
+ - Files are saved to specified path within workspace
984
+ - Automatic video download and storage
985
+
986
+ #### Examples
987
+
988
+ **Basic Video Generation**:
989
+
990
+ ```python
991
+ from massgen.tool._multimodal_tools import text_to_video_generation
992
+
993
+ # Generate a 4-second video
994
+ result = await text_to_video_generation(
995
+ prompt="A cool cat on a motorcycle in the night"
996
+ )
997
+
998
+ # Output includes video path
999
+ print(result.output_blocks[0].data)
1000
+ # {
1001
+ # "success": true,
1002
+ # "operation": "generate_and_store_video_no_input_images",
1003
+ # "video_path": "/workspace/20240115_143022_a_cool_cat_on_motorcycle.mp4",
1004
+ # "size": 5242880,
1005
+ # "duration": 45.2
1006
+ # }
1007
+ ```
1008
+
1009
+ **Detailed Scene with Duration**:
1010
+
1011
+ ```python
1012
+ # Generate with detailed prompt and custom duration
1013
+ result = await text_to_video_generation(
1014
+ prompt="Neon-lit alley at night, light rain, slow push-in, cinematic lighting",
1015
+ seconds=10,
1016
+ storage_path="videos/cinematic"
1017
+ )
1018
+ ```
1019
+
1020
+ **Configuration Example**:
1021
+
1022
+ ```yaml
1023
+ # massgen/configs/tools/custom_tools/multimodal_tools/text_to_video_generation_single.yaml
1024
+ agents:
1025
+ - id: "video_generator"
1026
+ backend:
1027
+ type: "openai"
1028
+ model: "gpt-4o"
1029
+ cwd: "workspace1"
1030
+ enable_video_generation: true
1031
+ custom_tools:
1032
+ - name: ["text_to_video_generation"]
1033
+ category: "multimodal"
1034
+ path: "massgen/tool/_multimodal_tools/text_to_video_generation.py"
1035
+ function: ["text_to_video_generation"]
1036
+ ```
1037
+
1038
+ **CLI Usage**:
1039
+
1040
+ ```bash
1041
+ massgen --config massgen/configs/tools/custom_tools/multimodal_tools/text_to_video_generation_single.yaml "Generate a 4 seconds video with neon-lit alley at night, light rain, slow push-in, cinematic."
1042
+ ```
1043
+
1044
+ ---
1045
+
1046
+ ## Audio/Speech Generation Tools
1047
+
1048
+ ### text_to_speech_continue_generation
1049
+
1050
+ **What it does**: Generates expressive speech from text using OpenAI's GPT-4o Audio Preview model. Creates natural-sounding speech with emotional expression and context awareness.
1051
+
1052
+ **Why use it**: Allows agents to generate expressive speech with emotional tone. Perfect for creating voice-overs, narrations, audiobooks, or any content requiring natural, emotional speech.
1053
+
1054
+ **Location**: `massgen.tool._multimodal_tools.text_to_speech_continue_generation`
1055
+
1056
+ #### Parameters
1057
+
1058
+ - `prompt` (required): Text content to convert to audio speech
1059
+ - `model` (optional): Model to use (default: "gpt-4o-audio-preview")
1060
+ - `voice` (optional): Voice to use (default: "alloy")
1061
+ - Options: "alloy", "echo", "fable", "onyx", "nova", "shimmer"
1062
+ - `audio_format` (optional): Audio format for output (default: "wav")
1063
+ - Options: "wav", "mp3", "opus", "aac", "flac"
1064
+ - `storage_path` (optional): Directory path where to save the audio
1065
+ - **IMPORTANT**: Must be a DIRECTORY path only
1066
+ - Filename is automatically generated from prompt and timestamp
1067
+ - `allowed_paths` (optional): List of allowed base paths for validation
1068
+
1069
+ #### Returns
1070
+
1071
+ ExecutionResult containing:
1072
+ - `success`: Whether operation succeeded
1073
+ - `operation`: "generate_and_store_audio_no_input_audios"
1074
+ - `audio_file`: Generated audio file with path and metadata
1075
+ - `model`: Model used for generation
1076
+ - `prompt`: The prompt used for generation
1077
+ - `voice`: Voice used for generation
1078
+ - `format`: Audio format used
1079
+
1080
+ #### Examples
1081
+
1082
+ **Expressive Speech**:
1083
+
1084
+ ```python
1085
+ from massgen.tool._multimodal_tools import text_to_speech_continue_generation
1086
+
1087
+ # Generate expressive speech
1088
+ result = await text_to_speech_continue_generation(
1089
+ prompt="I want you to tell me a very short introduction about Sherlock Holmes in one sentence, and I want you to use emotion voice to read it out loud."
1090
+ )
1091
+ ```
1092
+
1093
+ **Custom Voice and Format**:
1094
+
1095
+ ```python
1096
+ # Generate with specific voice and format
1097
+ result = await text_to_speech_continue_generation(
1098
+ prompt="Hello world",
1099
+ voice="nova",
1100
+ audio_format="mp3",
1101
+ storage_path="audio/generated"
1102
+ )
1103
+ ```
1104
+
1105
+ ---
1106
+
1107
+ ### text_to_speech_transcription_generation
1108
+
1109
+ **What it does**: Converts text directly to speech using OpenAI's TTS API with streaming response. Provides fast, cost-effective text-to-speech conversion.
1110
+
1111
+ **Why use it**: Allows agents to quickly convert text to speech. Perfect for transcription conversion, simple voice-overs, or when expressive emotion is not required.
1112
+
1113
+ **Location**: `massgen.tool._multimodal_tools.text_to_speech_transcription_generation`
1114
+
1115
+ #### Parameters
1116
+
1117
+ - `input_text` (required): The text content to convert to speech
1118
+ - `model` (optional): TTS model to use (default: "gpt-4o-mini-tts")
1119
+ - Options: "gpt-4o-mini-tts", "tts-1", "tts-1-hd"
1120
+ - `voice` (optional): Voice to use (default: "alloy")
1121
+ - Options: "alloy", "echo", "fable", "onyx", "nova", "shimmer", "coral", "sage"
1122
+ - `instructions` (optional): Optional speaking instructions for tone and style
1123
+ - Example: "Speak in a cheerful tone"
1124
+ - `storage_path` (optional): Directory path where to save the audio file
1125
+ - **IMPORTANT**: Must be a DIRECTORY path only
1126
+ - `audio_format` (optional): Output audio format (default: "mp3")
1127
+ - Options: "mp3", "opus", "aac", "flac", "wav", "pcm"
1128
+ - `allowed_paths` (optional): List of allowed base paths for validation
1129
+
1130
+ #### Returns
1131
+
1132
+ ExecutionResult containing:
1133
+ - `success`: Whether operation succeeded
1134
+ - `operation`: "convert_text_to_speech"
1135
+ - `audio_file`: Generated audio file with path and metadata
1136
+ - `model`: TTS model used
1137
+ - `voice`: Voice used
1138
+ - `format`: Audio format used
1139
+ - `text_length`: Length of input text
1140
+ - `instructions`: Speaking instructions if provided
1141
+
1142
+ #### Examples
1143
+
1144
+ **Simple Text-to-Speech**:
1145
+
1146
+ ```python
1147
+ from massgen.tool._multimodal_tools import text_to_speech_transcription_generation
1148
+
1149
+ # Convert text to speech
1150
+ result = await text_to_speech_transcription_generation(
1151
+ input_text="Hello world, this is a test."
1152
+ )
1153
+ ```
1154
+
1155
+ **With Instructions**:
1156
+
1157
+ ```python
1158
+ # Convert with specific voice and instructions
1159
+ result = await text_to_speech_transcription_generation(
1160
+ input_text="Today is a wonderful day to build something people love!",
1161
+ voice="coral",
1162
+ instructions="Speak in a cheerful and positive tone."
1163
+ )
1164
+ ```
1165
+
1166
+ ---
1167
+
1168
+ ## File Generation Tools
1169
+
1170
+ ### text_to_file_generation
1171
+
1172
+ **What it does**: Generates text content using OpenAI API and saves it as various file formats (TXT, MD, PDF, PPTX). Creates professional documents from text prompts.
1173
+
1174
+ **Why use it**: Allows agents to create formatted documents automatically. Perfect for generating reports, documentation, presentations, or any structured text content.
1175
+
1176
+ **Location**: `massgen.tool._multimodal_tools.text_to_file_generation`
1177
+
1178
+ #### Parameters
1179
+
1180
+ - `prompt` (required): Description of the content to generate
1181
+ - Be specific about structure, sections, and formatting
1182
+ - Example: "Write a technical report about AI"
1183
+ - `file_format` (optional): Output file format (default: "txt")
1184
+ - Options: "txt", "md", "pdf", "pptx"
1185
+ - `filename` (optional): Custom filename without extension
1186
+ - If not provided, generates from prompt and timestamp
1187
+ - `model` (optional): OpenAI model to use (default: "gpt-4o")
1188
+ - Options: "gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "gpt-3.5-turbo"
1189
+ - `storage_path` (optional): Directory path where to save the file
1190
+ - **IMPORTANT**: Must be a DIRECTORY path only
1191
+ - Filename is automatically generated from prompt or custom filename
1192
+ - Relative path: Resolved relative to workspace
1193
+ - Absolute path: Must be within allowed directories
1194
+ - `allowed_paths` (optional): List of allowed base paths for validation
1195
+
1196
+ #### Returns
1197
+
1198
+ ExecutionResult containing:
1199
+ - `success`: Whether operation succeeded
1200
+ - `operation`: "generate_and_store_file"
1201
+ - `file_path`: Path to the generated file
1202
+ - `filename`: Name of the generated file
1203
+ - `file_format`: Format of the generated file
1204
+ - `content_preview`: First 500 characters of generated content
1205
+ - `file_size`: Size of the generated file in bytes
1206
+ - `model`: Model used for generation
1207
+ - `prompt`: The prompt used
1208
+
1209
+ #### Security Features
1210
+
1211
+ - Requires valid OpenAI API key
1212
+ - Files are saved to specified path within workspace
1213
+ - Path must be within allowed directories
1214
+
1215
+ #### Dependencies
1216
+
1217
+ - PDF generation requires either `reportlab` or `fpdf2` library
1218
+ - PPTX generation requires `python-pptx` library
1219
+
1220
+ ```bash
1221
+ pip install reportlab # For PDF
1222
+ pip install python-pptx # For PPTX
1223
+ ```
1224
+
1225
+ #### Examples
1226
+
1227
+ **Generate Markdown Document**:
1228
+
1229
+ ```python
1230
+ from massgen.tool._multimodal_tools import text_to_file_generation
1231
+
1232
+ # Generate a markdown file
1233
+ result = await text_to_file_generation(
1234
+ prompt="Write a blog post about Python",
1235
+ file_format="md"
1236
+ )
1237
+ ```
1238
+
1239
+ **Generate PDF Report**:
1240
+
1241
+ ```python
1242
+ # Generate a PDF with custom filename
1243
+ result = await text_to_file_generation(
1244
+ prompt="Create a technical report on machine learning",
1245
+ file_format="pdf",
1246
+ filename="ml_report",
1247
+ storage_path="documents/reports"
1248
+ )
1249
+ ```
1250
+
1251
+ **Generate PowerPoint Presentation**:
1252
+
1253
+ ```python
1254
+ # Generate PPTX - structure prompt with slide titles (# or ##) and bullet points (-)
1255
+ result = await text_to_file_generation(
1256
+ prompt="""Create a presentation about AI trends:
1257
+
1258
+ # Introduction
1259
+ - Overview of AI landscape
1260
+ - Key developments in 2024
1261
+
1262
+ # Current Trends
1263
+ - Large Language Models
1264
+ - Multimodal AI
1265
+ - AI Safety
1266
+
1267
+ # Future Outlook
1268
+ - Predictions for 2025
1269
+ - Emerging technologies
1270
+ """,
1271
+ file_format="pptx",
1272
+ filename="ai_trends_presentation"
1273
+ )
1274
+ ```
1275
+
1276
+ **Configuration Example**:
1277
+
1278
+ ```yaml
1279
+ # massgen/configs/tools/custom_tools/multimodal_tools/text_to_file_generation_single.yaml
1280
+ agents:
1281
+ - id: "document_generator"
1282
+ backend:
1283
+ type: "openai"
1284
+ model: "gpt-4o"
1285
+ cwd: "workspace1"
1286
+ enable_file_generation: true
1287
+ custom_tools:
1288
+ - name: ["text_to_file_generation"]
1289
+ category: "multimodal"
1290
+ path: "massgen/tool/_multimodal_tools/text_to_file_generation.py"
1291
+ function: ["text_to_file_generation"]
1292
+ ```
1293
+
1294
+ **CLI Usage**:
1295
+
1296
+ ```bash
1297
+ massgen --config massgen/configs/tools/custom_tools/multimodal_tools/text_to_file_generation_single.yaml "Generate a technical report about LLMs and save as PDF"
1298
+ ```
1299
+
1300
+ #### Note
1301
+
1302
+ - For PPTX format, structure your prompt to include slide titles (using # or ##) and bullet points (using -)
1303
+ - The quality and format of generated content depends on the prompt
1304
+ - Longer content may consume more tokens
1305
+
1306
+ ---
1307
+
1308
+ ## Best Practices for Generation Tools
1309
+
1310
+ ### Image Generation
1311
+
1312
+ 1. **Prompt Quality**:
1313
+ - Be specific about style, composition, lighting, and mood
1314
+ - Include details about colors, perspective, and atmosphere
1315
+ - Use artistic terminology for better results
1316
+
1317
+ 2. **Cost Management**:
1318
+ - Image generation (GPT-4.1) is more expensive than standard API calls
1319
+ - Test prompts with understanding tools first
1320
+ - Use multi-agent workflows to refine prompts before generation
1321
+
1322
+ ### Video Generation
1323
+
1324
+ 1. **Prompt Structure**:
1325
+ - Include: setting, lighting, camera movements, atmosphere
1326
+ - Specify duration based on content complexity (4-20 seconds)
1327
+ - Use cinematic terminology (push-in, pull-out, pan, etc.)
1328
+
1329
+ 2. **Quality Verification**:
1330
+ - Combine with `understand_video` tool for quality checks
1331
+ - Use multi-agent workflows for iterative refinement
1332
+
1333
+ ### Audio/Speech Generation
1334
+
1335
+ 1. **Voice Selection**:
1336
+ - Choose appropriate voice for content type
1337
+ - Use expressive model (gpt-4o-audio-preview) for emotional content
1338
+ - Use TTS model (gpt-4o-mini-tts) for simple conversions
1339
+
1340
+ 2. **Format Selection**:
1341
+ - WAV for highest quality
1342
+ - MP3 for balanced quality/size
1343
+ - OPUS for web streaming
1344
+
1345
+ ### Document Generation
1346
+
1347
+ 1. **Format Selection**:
1348
+ - TXT for simple content
1349
+ - MD for formatted documentation
1350
+ - PDF for professional documents
1351
+ - PPTX for presentations
1352
+
1353
+ 2. **Prompt Structure**:
1354
+ - Outline structure clearly
1355
+ - Specify sections and formatting
1356
+ - For PPTX, use markdown-style headers and bullets
1357
+
1358
+ ---
1359
+
772
1360
  ## Additional Resources
773
1361
 
774
1362
  - [OpenAI API Documentation](https://platform.openai.com/docs)
@@ -777,3 +1365,4 @@ Error: File does not appear to be a video file
777
1365
  - [python-docx Documentation](https://python-docx.readthedocs.io/)
778
1366
  - [openpyxl Documentation](https://openpyxl.readthedocs.io/)
779
1367
  - [python-pptx Documentation](https://python-pptx.readthedocs.io/)
1368
+ - [reportlab Documentation](https://www.reportlab.com/documentation/)