@laitszkin/apollo-toolkit 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (204) hide show
  1. package/AGENTS.md +62 -0
  2. package/CHANGELOG.md +100 -0
  3. package/LICENSE +21 -0
  4. package/README.md +144 -0
  5. package/align-project-documents/SKILL.md +94 -0
  6. package/align-project-documents/agents/openai.yaml +4 -0
  7. package/analyse-app-logs/LICENSE +21 -0
  8. package/analyse-app-logs/README.md +126 -0
  9. package/analyse-app-logs/SKILL.md +121 -0
  10. package/analyse-app-logs/agents/openai.yaml +4 -0
  11. package/analyse-app-logs/references/investigation-checklist.md +58 -0
  12. package/analyse-app-logs/references/log-signal-patterns.md +52 -0
  13. package/answering-questions-with-research/SKILL.md +46 -0
  14. package/answering-questions-with-research/agents/openai.yaml +4 -0
  15. package/bin/apollo-toolkit.js +7 -0
  16. package/commit-and-push/LICENSE +21 -0
  17. package/commit-and-push/README.md +26 -0
  18. package/commit-and-push/SKILL.md +70 -0
  19. package/commit-and-push/agents/openai.yaml +4 -0
  20. package/commit-and-push/references/branch-naming.md +15 -0
  21. package/commit-and-push/references/commit-messages.md +19 -0
  22. package/deep-research-topics/LICENSE +21 -0
  23. package/deep-research-topics/README.md +43 -0
  24. package/deep-research-topics/SKILL.md +84 -0
  25. package/deep-research-topics/agents/openai.yaml +4 -0
  26. package/develop-new-features/LICENSE +21 -0
  27. package/develop-new-features/README.md +52 -0
  28. package/develop-new-features/SKILL.md +105 -0
  29. package/develop-new-features/agents/openai.yaml +4 -0
  30. package/develop-new-features/references/testing-e2e.md +35 -0
  31. package/develop-new-features/references/testing-integration.md +42 -0
  32. package/develop-new-features/references/testing-property-based.md +44 -0
  33. package/develop-new-features/references/testing-unit.md +37 -0
  34. package/discover-edge-cases/CHANGELOG.md +19 -0
  35. package/discover-edge-cases/LICENSE +21 -0
  36. package/discover-edge-cases/README.md +87 -0
  37. package/discover-edge-cases/SKILL.md +124 -0
  38. package/discover-edge-cases/agents/openai.yaml +4 -0
  39. package/discover-edge-cases/references/architecture-edge-cases.md +41 -0
  40. package/discover-edge-cases/references/code-edge-cases.md +46 -0
  41. package/docs-to-voice/.env.example +106 -0
  42. package/docs-to-voice/CHANGELOG.md +71 -0
  43. package/docs-to-voice/LICENSE +21 -0
  44. package/docs-to-voice/README.md +118 -0
  45. package/docs-to-voice/SKILL.md +107 -0
  46. package/docs-to-voice/agents/openai.yaml +4 -0
  47. package/docs-to-voice/scripts/docs_to_voice.py +1385 -0
  48. package/docs-to-voice/scripts/docs_to_voice.sh +11 -0
  49. package/docs-to-voice/tests/test_docs_to_voice_api_max_chars.py +210 -0
  50. package/docs-to-voice/tests/test_docs_to_voice_sentence_timeline.py +115 -0
  51. package/docs-to-voice/tests/test_docs_to_voice_settings.py +43 -0
  52. package/docs-to-voice/tests/test_docs_to_voice_speech_rate.py +57 -0
  53. package/enhance-existing-features/CHANGELOG.md +35 -0
  54. package/enhance-existing-features/LICENSE +21 -0
  55. package/enhance-existing-features/README.md +54 -0
  56. package/enhance-existing-features/SKILL.md +120 -0
  57. package/enhance-existing-features/agents/openai.yaml +4 -0
  58. package/enhance-existing-features/references/e2e-tests.md +25 -0
  59. package/enhance-existing-features/references/integration-tests.md +30 -0
  60. package/enhance-existing-features/references/property-based-tests.md +33 -0
  61. package/enhance-existing-features/references/unit-tests.md +29 -0
  62. package/feature-propose/LICENSE +21 -0
  63. package/feature-propose/README.md +23 -0
  64. package/feature-propose/SKILL.md +107 -0
  65. package/feature-propose/agents/openai.yaml +4 -0
  66. package/feature-propose/references/enhancement-features.md +25 -0
  67. package/feature-propose/references/important-features.md +25 -0
  68. package/feature-propose/references/mvp-features.md +25 -0
  69. package/feature-propose/references/performance-features.md +25 -0
  70. package/financial-research/SKILL.md +208 -0
  71. package/financial-research/agents/openai.yaml +4 -0
  72. package/financial-research/assets/weekly_market_report_template.md +45 -0
  73. package/fix-github-issues/SKILL.md +98 -0
  74. package/fix-github-issues/agents/openai.yaml +4 -0
  75. package/fix-github-issues/scripts/list_issues.py +148 -0
  76. package/fix-github-issues/tests/test_list_issues.py +127 -0
  77. package/generate-spec/LICENSE +21 -0
  78. package/generate-spec/README.md +61 -0
  79. package/generate-spec/SKILL.md +96 -0
  80. package/generate-spec/agents/openai.yaml +4 -0
  81. package/generate-spec/references/templates/checklist.md +78 -0
  82. package/generate-spec/references/templates/spec.md +55 -0
  83. package/generate-spec/references/templates/tasks.md +35 -0
  84. package/generate-spec/scripts/create-specs +123 -0
  85. package/harden-app-security/CHANGELOG.md +27 -0
  86. package/harden-app-security/LICENSE +21 -0
  87. package/harden-app-security/README.md +46 -0
  88. package/harden-app-security/SKILL.md +127 -0
  89. package/harden-app-security/agents/openai.yaml +4 -0
  90. package/harden-app-security/references/agent-attack-catalog.md +117 -0
  91. package/harden-app-security/references/common-software-attack-catalog.md +168 -0
  92. package/harden-app-security/references/red-team-extreme-scenarios.md +81 -0
  93. package/harden-app-security/references/risk-checklist.md +78 -0
  94. package/harden-app-security/references/security-test-patterns-agent.md +101 -0
  95. package/harden-app-security/references/security-test-patterns-finance.md +88 -0
  96. package/harden-app-security/references/test-snippets.md +73 -0
  97. package/improve-observability/SKILL.md +114 -0
  98. package/improve-observability/agents/openai.yaml +4 -0
  99. package/learn-skill-from-conversations/CHANGELOG.md +15 -0
  100. package/learn-skill-from-conversations/LICENSE +22 -0
  101. package/learn-skill-from-conversations/README.md +47 -0
  102. package/learn-skill-from-conversations/SKILL.md +85 -0
  103. package/learn-skill-from-conversations/agents/openai.yaml +4 -0
  104. package/learn-skill-from-conversations/scripts/extract_recent_conversations.py +369 -0
  105. package/learn-skill-from-conversations/tests/test_extract_recent_conversations.py +176 -0
  106. package/learning-error-book/SKILL.md +112 -0
  107. package/learning-error-book/agents/openai.yaml +4 -0
  108. package/learning-error-book/assets/error_book_template.md +66 -0
  109. package/learning-error-book/scripts/render_markdown_to_pdf.py +367 -0
  110. package/lib/cli.js +338 -0
  111. package/lib/installer.js +225 -0
  112. package/maintain-project-constraints/SKILL.md +109 -0
  113. package/maintain-project-constraints/agents/openai.yaml +4 -0
  114. package/maintain-skill-catalog/README.md +18 -0
  115. package/maintain-skill-catalog/SKILL.md +66 -0
  116. package/maintain-skill-catalog/agents/openai.yaml +4 -0
  117. package/novel-to-short-video/CHANGELOG.md +53 -0
  118. package/novel-to-short-video/LICENSE +21 -0
  119. package/novel-to-short-video/README.md +63 -0
  120. package/novel-to-short-video/SKILL.md +233 -0
  121. package/novel-to-short-video/agents/openai.yaml +4 -0
  122. package/novel-to-short-video/references/plan-template.md +71 -0
  123. package/novel-to-short-video/references/roles-json.md +41 -0
  124. package/open-github-issue/LICENSE +21 -0
  125. package/open-github-issue/README.md +97 -0
  126. package/open-github-issue/SKILL.md +119 -0
  127. package/open-github-issue/agents/openai.yaml +4 -0
  128. package/open-github-issue/scripts/open_github_issue.py +380 -0
  129. package/open-github-issue/tests/test_open_github_issue.py +159 -0
  130. package/open-source-pr-workflow/CHANGELOG.md +32 -0
  131. package/open-source-pr-workflow/LICENSE +21 -0
  132. package/open-source-pr-workflow/README.md +23 -0
  133. package/open-source-pr-workflow/SKILL.md +123 -0
  134. package/open-source-pr-workflow/agents/openai.yaml +4 -0
  135. package/openai-text-to-image-storyboard/.env.example +10 -0
  136. package/openai-text-to-image-storyboard/CHANGELOG.md +49 -0
  137. package/openai-text-to-image-storyboard/LICENSE +21 -0
  138. package/openai-text-to-image-storyboard/README.md +99 -0
  139. package/openai-text-to-image-storyboard/SKILL.md +107 -0
  140. package/openai-text-to-image-storyboard/agents/openai.yaml +4 -0
  141. package/openai-text-to-image-storyboard/scripts/generate_storyboard_images.py +763 -0
  142. package/package.json +36 -0
  143. package/record-spending/SKILL.md +113 -0
  144. package/record-spending/agents/openai.yaml +4 -0
  145. package/record-spending/references/account-format.md +33 -0
  146. package/record-spending/references/workbook-layout.md +84 -0
  147. package/resolve-review-comments/SKILL.md +122 -0
  148. package/resolve-review-comments/agents/openai.yaml +4 -0
  149. package/resolve-review-comments/references/adoption-criteria.md +23 -0
  150. package/resolve-review-comments/scripts/review_threads.py +425 -0
  151. package/resolve-review-comments/tests/test_review_threads.py +74 -0
  152. package/review-change-set/LICENSE +21 -0
  153. package/review-change-set/README.md +55 -0
  154. package/review-change-set/SKILL.md +103 -0
  155. package/review-change-set/agents/openai.yaml +4 -0
  156. package/review-codebases/LICENSE +21 -0
  157. package/review-codebases/README.md +67 -0
  158. package/review-codebases/SKILL.md +109 -0
  159. package/review-codebases/agents/openai.yaml +4 -0
  160. package/scripts/install_skills.ps1 +283 -0
  161. package/scripts/install_skills.sh +262 -0
  162. package/scripts/validate_openai_agent_config.py +194 -0
  163. package/scripts/validate_skill_frontmatter.py +110 -0
  164. package/specs-to-project-docs/LICENSE +21 -0
  165. package/specs-to-project-docs/README.md +57 -0
  166. package/specs-to-project-docs/SKILL.md +111 -0
  167. package/specs-to-project-docs/agents/openai.yaml +4 -0
  168. package/specs-to-project-docs/references/templates/architecture.md +29 -0
  169. package/specs-to-project-docs/references/templates/configuration.md +29 -0
  170. package/specs-to-project-docs/references/templates/developer-guide.md +33 -0
  171. package/specs-to-project-docs/references/templates/docs-index.md +39 -0
  172. package/specs-to-project-docs/references/templates/features.md +25 -0
  173. package/specs-to-project-docs/references/templates/getting-started.md +38 -0
  174. package/specs-to-project-docs/references/templates/readme.md +49 -0
  175. package/systematic-debug/LICENSE +21 -0
  176. package/systematic-debug/README.md +81 -0
  177. package/systematic-debug/SKILL.md +59 -0
  178. package/systematic-debug/agents/openai.yaml +4 -0
  179. package/text-to-short-video/.env.example +36 -0
  180. package/text-to-short-video/LICENSE +21 -0
  181. package/text-to-short-video/README.md +82 -0
  182. package/text-to-short-video/SKILL.md +221 -0
  183. package/text-to-short-video/agents/openai.yaml +4 -0
  184. package/text-to-short-video/scripts/enforce_video_aspect_ratio.py +350 -0
  185. package/version-release/CHANGELOG.md +53 -0
  186. package/version-release/LICENSE +21 -0
  187. package/version-release/README.md +28 -0
  188. package/version-release/SKILL.md +94 -0
  189. package/version-release/agents/openai.yaml +4 -0
  190. package/version-release/references/branch-naming.md +15 -0
  191. package/version-release/references/changelog-writing.md +8 -0
  192. package/version-release/references/commit-messages.md +19 -0
  193. package/version-release/references/readme-writing.md +12 -0
  194. package/version-release/references/semantic-versioning.md +12 -0
  195. package/video-production/CHANGELOG.md +104 -0
  196. package/video-production/LICENSE +18 -0
  197. package/video-production/README.md +68 -0
  198. package/video-production/SKILL.md +213 -0
  199. package/video-production/agents/openai.yaml +4 -0
  200. package/video-production/references/plan-template.md +54 -0
  201. package/video-production/references/roles-json.md +41 -0
  202. package/weekly-financial-event-report/SKILL.md +195 -0
  203. package/weekly-financial-event-report/agents/openai.yaml +4 -0
  204. package/weekly-financial-event-report/assets/financial_event_report_template.md +53 -0
@@ -0,0 +1,46 @@
1
+ # Common Code-level Edge Cases (Reference List)
2
+
3
+ ## How to use
4
+ - Pick only 2-5 items directly related to the current change.
5
+ - Prioritize observable failures and high-risk inputs.
6
+
7
+ ## Input and typing
8
+ - Null/missing fields: None/null, empty string, empty collection
9
+ - Unexpected types: string-number mixing, boolean-integer confusion
10
+ - Oversized input: long strings, large arrays, deeply nested objects
11
+ - Encoding issues: UTF-8/non-ASCII, invisible characters
12
+
13
+ ## Boundaries and numerics
14
+ - Off-by-one: index 0/1 and length boundaries
15
+ - Overflow/underflow: integer/timestamp boundaries
16
+ - NaN/Inf: floating-point special values
17
+ - Precision loss: money/ratio calculations
18
+ - Negative values where invalid
19
+
20
+ ## Structure and ordering
21
+ - Duplicate elements: dedup/accumulation logic
22
+ - Ordering assumptions: sorting stability, input-order dependence
23
+ - Empty/singleton collections: reduce/min/max/avg behavior
24
+ - Mutable/immutable mismatch: in-place mutation of input data
25
+
26
+ ## Exceptions and error handling
27
+ - Parsing failures: date/timezone, JSON, CSV
28
+ - External dependency failures: 429/500/timeout
29
+ - Swallowed errors: `except pass` or missing logs
30
+ - Recovery strategy: retry count, backoff, degradation
31
+
32
+ ## State and side effects
33
+ - Reentrancy: same request invoked multiple times
34
+ - Global state contamination: cache/singleton bleed-through
35
+ - Mutable default parameters: Python list/dict defaults
36
+ - Resource release: file/connection not closed
37
+
38
+ ## Security and validation
39
+ - Insufficient authorization behavior
40
+ - Validation bypass via null/0/False
41
+ - Path/injection risks from string concatenation
42
+
43
+ ## Performance and limits
44
+ - N+1 query patterns inside loops
45
+ - Large-data stress: timeout/memory pressure
46
+ - Hotspots: lock contention under high-frequency calls
@@ -0,0 +1,106 @@
1
+ # docs-to-voice environment example
2
+ #
3
+ # MODE:
4
+ # - say: macOS built-in `say`
5
+ # - api: Alibaba Cloud Model Studio (Bailian) TTS API
6
+ DOCS_TO_VOICE_MODE="say"
7
+
8
+ # -----------------------------
9
+ # say mode defaults
10
+ # -----------------------------
11
+ # If --voice is provided, it overrides this value.
12
+ DOCS_TO_VOICE_VOICE="Eddy (中文(台灣))"
13
+
14
+ # -----------------------------
15
+ # api mode defaults (Model Studio)
16
+ # -----------------------------
17
+ # Required for mode=api.
18
+ DASHSCOPE_API_KEY=""
19
+ #
20
+ # Region endpoint (direct Model Studio API; choose ONE):
21
+ # - China Mainland (Beijing):
22
+ # https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
23
+ # - International (Singapore):
24
+ # https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
25
+ # - US (Virginia):
26
+ # https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
27
+ #
28
+ # NOTE:
29
+ # - API Key and endpoint region must match.
30
+ # - If you use OpenAI-compatible mode in other clients, base URL is:
31
+ # - https://dashscope.aliyuncs.com/compatible-mode/v1
32
+ # - https://dashscope-intl.aliyuncs.com/compatible-mode/v1
33
+ # - https://dashscope-us.aliyuncs.com/compatible-mode/v1
34
+ DOCS_TO_VOICE_API_ENDPOINT="https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation"
35
+
36
+ # Model options (choose one available in your region/account):
37
+ # - qwen3-tts
38
+ # - qwen3-tts-flash
39
+ # - qwen3-tts-instruct-flash
40
+ # - qwen-tts
41
+ # - qwen-tts-latest
42
+ DOCS_TO_VOICE_API_MODEL="qwen3-tts"
43
+
44
+ # Voice options (DOCS_TO_VOICE_API_VOICE) and style:
45
+ # - Cherry: 芊悅|陽光積極、親切自然小姐姐(女)
46
+ # - Serena: 蘇瑤|溫柔小姐姐(女)
47
+ # - Ethan: 晨煦|陽光溫暖、朝氣男聲(男)
48
+ # - Chelsie: 千雪|二次元虛擬女友(女)
49
+ # - Momo: 茉兔|撒嬌搞怪(女)
50
+ # - Vivian: 十三|拽拽可愛小暴躁(女)
51
+ # - Moon: 月白|率性帥氣(男)
52
+ # - Maia: 四月|知性溫柔(女)
53
+ # - Kai: 凱|通用中性
54
+ # - Nofish: 不吃魚|不會翹舌音的設計師(男)
55
+ # - Bella: 萌寶|小蘿莉感(女)
56
+ # - Jennifer: 詹妮弗|品牌感美語女聲(女)
57
+ # - Ryan: 甜茶|節奏感強、戲感足(男)
58
+ # - Katerina: 卡捷琳娜|御姐音色(女)
59
+ # - Aiden: 艾登|美語暖男(男)
60
+ # - Eldric Sage: 滄明子|沉穩睿智老者(男)
61
+ # - Mia: 乖小妹|溫順乖巧(女)
62
+ # - Mochi: 沙小彌|早慧童聲(男)
63
+ # - Bellona: 燕錚鶯|洪亮、戲劇張力強(女)
64
+ # - Vincent: 田叔|沙啞煙嗓(男)
65
+ # - Bunny: 萌小姬|萌系蘿莉(女)
66
+ # - Neil: 阿聞|新聞主持風(男)
67
+ # - Elias: 墨講師|知性講解(女)
68
+ # - Arthur: 徐大爺|質樸長者敘事(男)
69
+ # - Nini: 鄰家妹妹|甜妹黏糯感(女)
70
+ # - Ebona: 詭婆婆|驚悚詭譎(女)
71
+ # - Seren: 小婉|助眠舒緩(女)
72
+ # - Pip: 頑屁小孩|調皮童真(男)
73
+ # - Stella: 少女阿月|元氣少女(女)
74
+ # - Bodega: 博德加|熱情西語大叔(男)
75
+ # - Sonrisa: 索尼莎|熱情拉美女聲(女)
76
+ # - Alek: 阿列克|冷暖反差俄語男聲(男)
77
+ # - Dolce: 多爾切|慵懶義式大叔(男)
78
+ # - Sohee: 素熙|溫柔開朗韓系女聲(女)
79
+ # - Ono Anna: 小野杏|鬼靈精怪日系女聲(女)
80
+ # - Lenn: 萊恩|理性德系青年(男)
81
+ # - Emilien: 埃米爾安|法式浪漫男聲(男)
82
+ # - Andre: 安德雷|沉穩磁性男聲(男)
83
+ # - Radio Gol: 拉迪奧·戈爾|足球解說風
84
+ # - Jada: 上海-阿珍|上海話(女)
85
+ # - Dylan: 北京-曉東|北京話(男)
86
+ # - Li: 南京-老李|南京話(男)
87
+ # - Marcus: 陝西-秦川|陝西話(男)
88
+ # - Roy: 閩南-阿杰|閩南語(男)
89
+ # - Peter: 天津-李彼得|天津話相聲風(男)
90
+ # - Sunny: 四川-晴兒|四川話甜美女聲(女)
91
+ # - Eric: 四川-程川|四川話(男)
92
+ # - Rocky: 粵語-阿強|粵語(男)
93
+ # - Kiki: 粵語-阿清|粵語(女)
94
+ DOCS_TO_VOICE_API_VOICE="Cherry"
95
+
96
+ # Long text chunking:
97
+ # - Empty means api mode auto-discovers model max input length before chunking.
98
+ # - Set a positive integer to force a custom chunk limit.
99
+ # - 0 disables chunking.
100
+ DOCS_TO_VOICE_MAX_CHARS=""
101
+
102
+ # Optional post-process speech speed multiplier:
103
+ # - Empty keeps original speed.
104
+ # - Set a positive number (e.g. 1.2 faster, 0.8 slower).
105
+ # - Requires ffmpeg when value is not 1.
106
+ DOCS_TO_VOICE_SPEECH_RATE=""
@@ -0,0 +1,71 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on Keep a Changelog.
6
+
7
+ ## [Unreleased]
8
+
9
+ ## [0.5.0] - 2026-02-18
10
+
11
+ ### Added
12
+ - Added `--speech-rate` (and `DOCS_TO_VOICE_SPEECH_RATE`) for optional ffmpeg-based post-process speed adjustment.
13
+ - Added speech-rate unit tests for input validation, filter-chain generation, and timeline duration scaling.
14
+
15
+ ### Fixed
16
+ - Rejected non-finite `--speech-rate` values (for example `nan`/`inf`) to avoid invalid ffmpeg filter generation.
17
+
18
+ ## [0.4.0] - 2026-02-15
19
+
20
+ ### Changed
21
+ - API mode now sends TTS requests sentence-by-sentence and still merges all generated audio into one final output file.
22
+ - API sentence timelines now prefer measured per-sentence audio durations to produce more precise subtitle timestamps.
23
+
24
+ ### Added
25
+ - Added `split_text_into_api_sentence_requests` to keep sentence boundaries in API synthesis while still splitting only oversized sentences by `--max-chars`.
26
+ - Added timeline tests covering sentence-based API request splitting and sentence-duration timestamp output.
27
+
28
+ ## [0.3.0] - 2026-02-15
29
+
30
+ ### Added
31
+ - Added API-mode max-length discovery that first checks model catalog metadata and then probes API length validation when metadata is unavailable.
32
+
33
+ ### Changed
34
+ - API mode now auto-splits long text by discovered max input length when `--max-chars` (or `DOCS_TO_VOICE_MAX_CHARS`) is not provided.
35
+ - API chunking now applies weighted input-unit counting for CJK text to match qwen3-tts length rules.
36
+
37
+ ### Fixed
38
+ - Added explicit `0` handling for `--max-chars` / `DOCS_TO_VOICE_MAX_CHARS` so users can still disable chunking.
39
+ - Wrapped transient HTTP disconnect exceptions as user-facing CLI errors instead of raw tracebacks.
40
+
41
+ ## [0.2.1] - 2026-02-15
42
+
43
+ ### Added
44
+ - Added unit tests for `resolve_setting` to lock configuration precedence and fallback behavior.
45
+
46
+ ### Fixed
47
+ - Fixed configuration precedence so omitted CLI mode/options now prefer `.env` values before shell environment variables.
48
+ - Prevented blank CLI values from incorrectly overriding `.env` settings.
49
+
50
+ ## [0.2.0] - 2026-02-14
51
+
52
+ ### Added
53
+ - Added `scripts/docs_to_voice.py` as the primary CLI implementation for both `say` and `api` modes.
54
+ - Added built-in Python API handling for Model Studio requests/responses (URL and base64 audio paths) to simplify API mode operations.
55
+ - Added long-text chunking via `--max-chars` and `DOCS_TO_VOICE_MAX_CHARS`, with automatic per-chunk synthesis and final audio concatenation.
56
+
57
+ ### Changed
58
+ - Replaced `scripts/docs_to_voice.sh` logic with a lightweight compatibility wrapper that delegates to the Python CLI.
59
+ - Updated README, SKILL, `.env.example`, and agent prompt guidance to use the Python script as the default entrypoint and document chunked generation for long text.
60
+
61
+ ## [0.1.0] - 2026-02-14
62
+
63
+ ### Added
64
+ - Added `api` mode for Alibaba Cloud Model Studio TTS, including endpoint/model/voice configuration and API key support.
65
+ - Added sentence-level timeline outputs for each generated audio file: `.timeline.json` and `.srt`.
66
+ - Added timeline metadata with per-sentence start/end offsets (seconds and milliseconds) to support subtitle alignment.
67
+
68
+ ### Changed
69
+ - Updated `.env.example` with full mode selection, endpoint guidance, and API voice/model defaults.
70
+ - Updated skill and README documentation to describe dual-mode generation and subtitle timeline artifacts.
71
+ - Updated agent default prompt to include timeline generation behavior.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 tszkinlai
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,118 @@
1
+ # docs-to-voice
2
+
3
+ Convert text into voice files and always write outputs to:
4
+
5
+ `project_dir/audio/{project_name}/`
6
+
7
+ Each run also generates subtitle timeline files in the same folder:
8
+
9
+ - `{audio_name_without_extension}.timeline.json`: sentence-level subtitle start/end timestamps (seconds and milliseconds)
10
+ - `{audio_name_without_extension}.srt`: ready-to-use subtitle track file
11
+
12
+ Two modes are supported:
13
+
14
+ - `say`: built-in macOS `say`
15
+ - `api`: Alibaba Cloud Model Studio TTS API (for example `qwen3-tts`)
16
+
17
+ ## Features
18
+
19
+ - Supports both `--text` and `--input-file`
20
+ - Supports `--mode say|api`
21
+ - Supports `.env` for API key/model/voice settings
22
+ - Supports `--output-name` for deterministic output filename
23
+ - `say` mode enables punctuation prosody enhancement by default (disable with `--no-auto-prosody`)
24
+ - `api` mode sends sentence-level TTS requests and merges output into one audio file
25
+ - `api` mode auto-detects model max input length and segments over-limit sentences
26
+ - `--max-chars` (or `.env`) can override segment length limit manually
27
+ - `--speech-rate` (or `.env`) can apply post-processing speech-rate adjustment (requires `ffmpeg`)
28
+ - API timeline prefers real per-sentence audio duration for higher subtitle precision
29
+ - Automatically outputs sentence-level timeline files (`.timeline.json` + `.srt`)
30
+
31
+ ## Requirements
32
+
33
+ - `say` mode: macOS + `say` + `python3`
34
+ - `api` mode: `python3` + Model Studio API key
35
+ - Long-text merge workflow: `ffmpeg` recommended (especially for AIFF output)
36
+
37
+ ## Quick start
38
+
39
+ ### 1) say mode
40
+
41
+ ```bash
42
+ python3 scripts/docs_to_voice.py \
43
+ --project-dir "/path/to/project" \
44
+ --mode say \
45
+ --text "Hello, this is a voice synthesis test."
46
+ ```
47
+
48
+ ### 2) api mode (Model Studio)
49
+
50
+ ```bash
51
+ python3 scripts/docs_to_voice.py \
52
+ --project-dir "/path/to/project" \
53
+ --mode api \
54
+ --text "Hello, this is a qwen3-tts test."
55
+ ```
56
+
57
+ > Compatibility note: `scripts/docs_to_voice.sh` still works and internally delegates to the Python script.
58
+
59
+ ## `.env` settings
60
+
61
+ 1. Copy template
62
+
63
+ ```bash
64
+ cp .env.example .env
65
+ ```
66
+
67
+ 2. Configure mode and API parameters (example)
68
+
69
+ ```env
70
+ DOCS_TO_VOICE_MODE="api"
71
+ DASHSCOPE_API_KEY="sk-xxx"
72
+ DOCS_TO_VOICE_API_ENDPOINT="https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation"
73
+ DOCS_TO_VOICE_API_MODEL="qwen3-tts"
74
+ DOCS_TO_VOICE_API_VOICE="Cherry"
75
+ DOCS_TO_VOICE_MAX_CHARS=""
76
+ DOCS_TO_VOICE_SPEECH_RATE=""
77
+ ```
78
+
79
+ > CLI args `--mode`, `--api-model`, `--api-voice`, `--api-endpoint`, and `--speech-rate` override `.env`. If omitted, values are loaded from `.env` first, then fallback to same-name shell environment variables.
80
+
81
+ ## Parameters
82
+
83
+ ```text
84
+ --project-dir DIR required, project root path
85
+ --text TEXT choose one of --text / --input-file
86
+ --input-file FILE choose one of --text / --input-file
87
+ --project-name NAME optional, defaults to project_dir folder name
88
+ --output-name NAME optional, defaults to voice-YYYYmmdd-HHMMSS + mode extension
89
+ --env-file FILE optional, default: skill-folder/.env
90
+ --mode MODE optional, say or api
91
+ --voice NAME optional for say mode
92
+ --rate N optional for say mode
93
+ --speech-rate N optional, speech-rate multiplier (>0; 1.2=faster, 0.8=slower; requires ffmpeg)
94
+ --api-endpoint URL optional for api mode (Model Studio endpoint)
95
+ --api-model NAME optional for api mode
96
+ --api-voice NAME optional for api mode
97
+ --max-chars N optional, manual segment limit (api mode auto-detects if omitted; 0 means no segmentation)
98
+ --no-auto-prosody optional for say mode, disable punctuation prosody enhancement
99
+ --force optional, overwrite existing files
100
+ ```
101
+
102
+ ## Long-text notes
103
+
104
+ - API mode uses sentence-level TTS by default and only segments when a sentence exceeds model limits.
105
+ - If you already know a suitable limit, set `--max-chars` (or `DOCS_TO_VOICE_MAX_CHARS` in `.env`).
106
+ - For many `qwen3-tts` models, CJK chars typically count as 2 units and other chars as 1 unit; this script segments using that convention.
107
+
108
+ ## Endpoint note
109
+
110
+ If you need direct Model Studio `qwen3-tts` integration, use:
111
+
112
+ `https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation`
113
+
114
+ `https://dashscope-intl.aliyuncs.com/compatible-mode/v1` is the OpenAI-compatible base URL (typically for chat/completions), not the default direct Model Studio TTS endpoint used by this script.
115
+
116
+ ## License
117
+
118
+ MIT License (see `LICENSE`).
@@ -0,0 +1,107 @@
1
+ ---
2
+ name: docs-to-voice
3
+ description: Convert text and document content into audio files and sentence-level subtitle timelines under project_dir/audio/{project_name}/. Supports both macOS say and Alibaba Cloud Model Studio API modes.
4
+ ---
5
+
6
+ # Docs to Voice
7
+
8
+ ## Dependencies
9
+
10
+ - Required: none.
11
+ - Conditional: none.
12
+ - Optional: none.
13
+ - Fallback: not applicable.
14
+
15
+ ## Standards
16
+
17
+ - Evidence: Confirm `project_dir`, input source, mode, and environment-backed settings before generation.
18
+ - Execution: Use `scripts/docs_to_voice.py` to write audio plus matching timeline and subtitle files under `project_dir/audio/{project_name}/`.
19
+ - Quality: Respect mode-specific options, sentence splitting rules, and post-process requirements such as `ffmpeg` for speed changes.
20
+ - Output: Return the absolute output audio path together with the generated `.timeline.json` and `.srt` companions.
21
+
22
+ ## Overview
23
+
24
+ Use `scripts/docs_to_voice.py` to convert raw text or text files into audio and always save under:
25
+
26
+ `project_dir/audio/{project_name}/`
27
+
28
+ Alongside each audio file, the script also writes:
29
+
30
+ - `{audio_name_without_extension}.timeline.json`
31
+ - `{audio_name_without_extension}.srt`
32
+
33
+ Modes:
34
+
35
+ - `say`: local macOS `say`
36
+ - `api`: Alibaba Cloud Model Studio TTS API (for example `qwen3-tts`)
37
+
38
+ ## Workflow
39
+
40
+ 1. Collect inputs.
41
+ - Require `project_dir`.
42
+ - Accept either raw text or one input text file.
43
+ - Set `project_name`; default to basename of `project_dir`.
44
+
45
+ 2. Select mode.
46
+ - `--mode say` for local generation.
47
+ - `--mode api` for Model Studio API generation.
48
+ - If omitted, load `DOCS_TO_VOICE_MODE` from `.env`, then shell environment variables; fallback `say`.
49
+
50
+ 3. Prepare output path.
51
+ - Build `project_dir/audio/{project_name}/`.
52
+ - Create directory if it does not exist.
53
+
54
+ 4. Generate audio.
55
+ - `say` mode supports `--voice`, `--rate`, and punctuation-pause enhancement.
56
+ - `api` mode supports `--api-endpoint`, `--api-model`, `--api-voice`, and reads `DASHSCOPE_API_KEY`.
57
+ - `api` mode sends one request per sentence and concatenates all sentence audio into one final file.
58
+ - `api` mode auto discovers model max input length; only oversized sentences are split by that limit.
59
+ - `--max-chars` (or `DOCS_TO_VOICE_MAX_CHARS`) can override the sentence split limit; `0` disables chunking.
60
+ - `--speech-rate` (or `DOCS_TO_VOICE_SPEECH_RATE`) applies optional post-process speed adjustment and requires `ffmpeg` when value is not `1`.
61
+ - API splitting uses model counting rules (for `qwen3-tts`, CJK chars count as 2 units).
62
+
63
+ 5. Generate sentence-level timeline files.
64
+ - Write JSON timeline and SRT subtitle files next to audio output.
65
+ - In `api` mode, timeline start/end uses per-sentence audio durations whenever available.
66
+
67
+ 6. Return completion details.
68
+ - Report absolute output audio path.
69
+
70
+ ## Script Reference
71
+
72
+ `scripts/docs_to_voice.py` flags:
73
+
74
+ - `--project-dir` (required)
75
+ - `--project-name` (optional)
76
+ - `--text` or `--input-file` (exactly one required)
77
+ - `--env-file` (optional, default: `skill_dir/.env`)
78
+ - `--mode` (`say|api`, optional)
79
+ - `--voice` (optional, say mode)
80
+ - `--rate` (optional, say mode)
81
+ - `--speech-rate` (optional, post-process speed multiplier)
82
+ - `--api-endpoint` (optional, api mode)
83
+ - `--api-model` (optional, api mode)
84
+ - `--api-voice` (optional, api mode)
85
+ - `--max-chars` (optional, auto chunking threshold for long text)
86
+ - `--output-name` (optional)
87
+ - `--no-auto-prosody` (optional, say mode)
88
+ - `--force` (optional)
89
+
90
+ Environment variables:
91
+
92
+ - `DOCS_TO_VOICE_MODE`
93
+ - `DOCS_TO_VOICE_VOICE`
94
+ - `DOCS_TO_VOICE_API_ENDPOINT`
95
+ - `DOCS_TO_VOICE_API_MODEL`
96
+ - `DOCS_TO_VOICE_API_VOICE`
97
+ - `DOCS_TO_VOICE_MAX_CHARS`
98
+ - `DOCS_TO_VOICE_SPEECH_RATE`
99
+ - `DASHSCOPE_API_KEY`
100
+
101
+ ## Troubleshooting
102
+
103
+ - `say` mode: confirm `command -v say` and `command -v python3`.
104
+ - `api` mode: confirm `command -v python3` and valid `DASHSCOPE_API_KEY`.
105
+ - Long-text chunk merge (especially AIFF output): recommend `command -v ffmpeg`.
106
+ - If output exists, use `--force` or a new `--output-name`.
107
+ - `scripts/docs_to_voice.sh` is kept as a compatibility wrapper for existing workflows.
@@ -0,0 +1,4 @@
1
+ interface:
2
+ display_name: "Docs to Voice"
3
+ short_description: "Convert text into voice files with macOS say or Model Studio API"
4
+ default_prompt: "Use $docs-to-voice to convert text into an audio file saved under project_dir/audio/{project_name}/, generate sentence timelines (.timeline.json and .srt), synthesize API mode sentence-by-sentence, and concatenate all generated parts into one final audio file."