@laitszkin/apollo-toolkit 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +62 -0
- package/CHANGELOG.md +100 -0
- package/LICENSE +21 -0
- package/README.md +144 -0
- package/align-project-documents/SKILL.md +94 -0
- package/align-project-documents/agents/openai.yaml +4 -0
- package/analyse-app-logs/LICENSE +21 -0
- package/analyse-app-logs/README.md +126 -0
- package/analyse-app-logs/SKILL.md +121 -0
- package/analyse-app-logs/agents/openai.yaml +4 -0
- package/analyse-app-logs/references/investigation-checklist.md +58 -0
- package/analyse-app-logs/references/log-signal-patterns.md +52 -0
- package/answering-questions-with-research/SKILL.md +46 -0
- package/answering-questions-with-research/agents/openai.yaml +4 -0
- package/bin/apollo-toolkit.js +7 -0
- package/commit-and-push/LICENSE +21 -0
- package/commit-and-push/README.md +26 -0
- package/commit-and-push/SKILL.md +70 -0
- package/commit-and-push/agents/openai.yaml +4 -0
- package/commit-and-push/references/branch-naming.md +15 -0
- package/commit-and-push/references/commit-messages.md +19 -0
- package/deep-research-topics/LICENSE +21 -0
- package/deep-research-topics/README.md +43 -0
- package/deep-research-topics/SKILL.md +84 -0
- package/deep-research-topics/agents/openai.yaml +4 -0
- package/develop-new-features/LICENSE +21 -0
- package/develop-new-features/README.md +52 -0
- package/develop-new-features/SKILL.md +105 -0
- package/develop-new-features/agents/openai.yaml +4 -0
- package/develop-new-features/references/testing-e2e.md +35 -0
- package/develop-new-features/references/testing-integration.md +42 -0
- package/develop-new-features/references/testing-property-based.md +44 -0
- package/develop-new-features/references/testing-unit.md +37 -0
- package/discover-edge-cases/CHANGELOG.md +19 -0
- package/discover-edge-cases/LICENSE +21 -0
- package/discover-edge-cases/README.md +87 -0
- package/discover-edge-cases/SKILL.md +124 -0
- package/discover-edge-cases/agents/openai.yaml +4 -0
- package/discover-edge-cases/references/architecture-edge-cases.md +41 -0
- package/discover-edge-cases/references/code-edge-cases.md +46 -0
- package/docs-to-voice/.env.example +106 -0
- package/docs-to-voice/CHANGELOG.md +71 -0
- package/docs-to-voice/LICENSE +21 -0
- package/docs-to-voice/README.md +118 -0
- package/docs-to-voice/SKILL.md +107 -0
- package/docs-to-voice/agents/openai.yaml +4 -0
- package/docs-to-voice/scripts/docs_to_voice.py +1385 -0
- package/docs-to-voice/scripts/docs_to_voice.sh +11 -0
- package/docs-to-voice/tests/test_docs_to_voice_api_max_chars.py +210 -0
- package/docs-to-voice/tests/test_docs_to_voice_sentence_timeline.py +115 -0
- package/docs-to-voice/tests/test_docs_to_voice_settings.py +43 -0
- package/docs-to-voice/tests/test_docs_to_voice_speech_rate.py +57 -0
- package/enhance-existing-features/CHANGELOG.md +35 -0
- package/enhance-existing-features/LICENSE +21 -0
- package/enhance-existing-features/README.md +54 -0
- package/enhance-existing-features/SKILL.md +120 -0
- package/enhance-existing-features/agents/openai.yaml +4 -0
- package/enhance-existing-features/references/e2e-tests.md +25 -0
- package/enhance-existing-features/references/integration-tests.md +30 -0
- package/enhance-existing-features/references/property-based-tests.md +33 -0
- package/enhance-existing-features/references/unit-tests.md +29 -0
- package/feature-propose/LICENSE +21 -0
- package/feature-propose/README.md +23 -0
- package/feature-propose/SKILL.md +107 -0
- package/feature-propose/agents/openai.yaml +4 -0
- package/feature-propose/references/enhancement-features.md +25 -0
- package/feature-propose/references/important-features.md +25 -0
- package/feature-propose/references/mvp-features.md +25 -0
- package/feature-propose/references/performance-features.md +25 -0
- package/financial-research/SKILL.md +208 -0
- package/financial-research/agents/openai.yaml +4 -0
- package/financial-research/assets/weekly_market_report_template.md +45 -0
- package/fix-github-issues/SKILL.md +98 -0
- package/fix-github-issues/agents/openai.yaml +4 -0
- package/fix-github-issues/scripts/list_issues.py +148 -0
- package/fix-github-issues/tests/test_list_issues.py +127 -0
- package/generate-spec/LICENSE +21 -0
- package/generate-spec/README.md +61 -0
- package/generate-spec/SKILL.md +96 -0
- package/generate-spec/agents/openai.yaml +4 -0
- package/generate-spec/references/templates/checklist.md +78 -0
- package/generate-spec/references/templates/spec.md +55 -0
- package/generate-spec/references/templates/tasks.md +35 -0
- package/generate-spec/scripts/create-specs +123 -0
- package/harden-app-security/CHANGELOG.md +27 -0
- package/harden-app-security/LICENSE +21 -0
- package/harden-app-security/README.md +46 -0
- package/harden-app-security/SKILL.md +127 -0
- package/harden-app-security/agents/openai.yaml +4 -0
- package/harden-app-security/references/agent-attack-catalog.md +117 -0
- package/harden-app-security/references/common-software-attack-catalog.md +168 -0
- package/harden-app-security/references/red-team-extreme-scenarios.md +81 -0
- package/harden-app-security/references/risk-checklist.md +78 -0
- package/harden-app-security/references/security-test-patterns-agent.md +101 -0
- package/harden-app-security/references/security-test-patterns-finance.md +88 -0
- package/harden-app-security/references/test-snippets.md +73 -0
- package/improve-observability/SKILL.md +114 -0
- package/improve-observability/agents/openai.yaml +4 -0
- package/learn-skill-from-conversations/CHANGELOG.md +15 -0
- package/learn-skill-from-conversations/LICENSE +22 -0
- package/learn-skill-from-conversations/README.md +47 -0
- package/learn-skill-from-conversations/SKILL.md +85 -0
- package/learn-skill-from-conversations/agents/openai.yaml +4 -0
- package/learn-skill-from-conversations/scripts/extract_recent_conversations.py +369 -0
- package/learn-skill-from-conversations/tests/test_extract_recent_conversations.py +176 -0
- package/learning-error-book/SKILL.md +112 -0
- package/learning-error-book/agents/openai.yaml +4 -0
- package/learning-error-book/assets/error_book_template.md +66 -0
- package/learning-error-book/scripts/render_markdown_to_pdf.py +367 -0
- package/lib/cli.js +338 -0
- package/lib/installer.js +225 -0
- package/maintain-project-constraints/SKILL.md +109 -0
- package/maintain-project-constraints/agents/openai.yaml +4 -0
- package/maintain-skill-catalog/README.md +18 -0
- package/maintain-skill-catalog/SKILL.md +66 -0
- package/maintain-skill-catalog/agents/openai.yaml +4 -0
- package/novel-to-short-video/CHANGELOG.md +53 -0
- package/novel-to-short-video/LICENSE +21 -0
- package/novel-to-short-video/README.md +63 -0
- package/novel-to-short-video/SKILL.md +233 -0
- package/novel-to-short-video/agents/openai.yaml +4 -0
- package/novel-to-short-video/references/plan-template.md +71 -0
- package/novel-to-short-video/references/roles-json.md +41 -0
- package/open-github-issue/LICENSE +21 -0
- package/open-github-issue/README.md +97 -0
- package/open-github-issue/SKILL.md +119 -0
- package/open-github-issue/agents/openai.yaml +4 -0
- package/open-github-issue/scripts/open_github_issue.py +380 -0
- package/open-github-issue/tests/test_open_github_issue.py +159 -0
- package/open-source-pr-workflow/CHANGELOG.md +32 -0
- package/open-source-pr-workflow/LICENSE +21 -0
- package/open-source-pr-workflow/README.md +23 -0
- package/open-source-pr-workflow/SKILL.md +123 -0
- package/open-source-pr-workflow/agents/openai.yaml +4 -0
- package/openai-text-to-image-storyboard/.env.example +10 -0
- package/openai-text-to-image-storyboard/CHANGELOG.md +49 -0
- package/openai-text-to-image-storyboard/LICENSE +21 -0
- package/openai-text-to-image-storyboard/README.md +99 -0
- package/openai-text-to-image-storyboard/SKILL.md +107 -0
- package/openai-text-to-image-storyboard/agents/openai.yaml +4 -0
- package/openai-text-to-image-storyboard/scripts/generate_storyboard_images.py +763 -0
- package/package.json +36 -0
- package/record-spending/SKILL.md +113 -0
- package/record-spending/agents/openai.yaml +4 -0
- package/record-spending/references/account-format.md +33 -0
- package/record-spending/references/workbook-layout.md +84 -0
- package/resolve-review-comments/SKILL.md +122 -0
- package/resolve-review-comments/agents/openai.yaml +4 -0
- package/resolve-review-comments/references/adoption-criteria.md +23 -0
- package/resolve-review-comments/scripts/review_threads.py +425 -0
- package/resolve-review-comments/tests/test_review_threads.py +74 -0
- package/review-change-set/LICENSE +21 -0
- package/review-change-set/README.md +55 -0
- package/review-change-set/SKILL.md +103 -0
- package/review-change-set/agents/openai.yaml +4 -0
- package/review-codebases/LICENSE +21 -0
- package/review-codebases/README.md +67 -0
- package/review-codebases/SKILL.md +109 -0
- package/review-codebases/agents/openai.yaml +4 -0
- package/scripts/install_skills.ps1 +283 -0
- package/scripts/install_skills.sh +262 -0
- package/scripts/validate_openai_agent_config.py +194 -0
- package/scripts/validate_skill_frontmatter.py +110 -0
- package/specs-to-project-docs/LICENSE +21 -0
- package/specs-to-project-docs/README.md +57 -0
- package/specs-to-project-docs/SKILL.md +111 -0
- package/specs-to-project-docs/agents/openai.yaml +4 -0
- package/specs-to-project-docs/references/templates/architecture.md +29 -0
- package/specs-to-project-docs/references/templates/configuration.md +29 -0
- package/specs-to-project-docs/references/templates/developer-guide.md +33 -0
- package/specs-to-project-docs/references/templates/docs-index.md +39 -0
- package/specs-to-project-docs/references/templates/features.md +25 -0
- package/specs-to-project-docs/references/templates/getting-started.md +38 -0
- package/specs-to-project-docs/references/templates/readme.md +49 -0
- package/systematic-debug/LICENSE +21 -0
- package/systematic-debug/README.md +81 -0
- package/systematic-debug/SKILL.md +59 -0
- package/systematic-debug/agents/openai.yaml +4 -0
- package/text-to-short-video/.env.example +36 -0
- package/text-to-short-video/LICENSE +21 -0
- package/text-to-short-video/README.md +82 -0
- package/text-to-short-video/SKILL.md +221 -0
- package/text-to-short-video/agents/openai.yaml +4 -0
- package/text-to-short-video/scripts/enforce_video_aspect_ratio.py +350 -0
- package/version-release/CHANGELOG.md +53 -0
- package/version-release/LICENSE +21 -0
- package/version-release/README.md +28 -0
- package/version-release/SKILL.md +94 -0
- package/version-release/agents/openai.yaml +4 -0
- package/version-release/references/branch-naming.md +15 -0
- package/version-release/references/changelog-writing.md +8 -0
- package/version-release/references/commit-messages.md +19 -0
- package/version-release/references/readme-writing.md +12 -0
- package/version-release/references/semantic-versioning.md +12 -0
- package/video-production/CHANGELOG.md +104 -0
- package/video-production/LICENSE +18 -0
- package/video-production/README.md +68 -0
- package/video-production/SKILL.md +213 -0
- package/video-production/agents/openai.yaml +4 -0
- package/video-production/references/plan-template.md +54 -0
- package/video-production/references/roles-json.md +41 -0
- package/weekly-financial-event-report/SKILL.md +195 -0
- package/weekly-financial-event-report/agents/openai.yaml +4 -0
- package/weekly-financial-event-report/assets/financial_event_report_template.md +53 -0
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
# Common Code-level Edge Cases (Reference List)
|
|
2
|
+
|
|
3
|
+
## How to use
|
|
4
|
+
- Pick only 2-5 items directly related to the current change.
|
|
5
|
+
- Prioritize observable failures and high-risk inputs.
|
|
6
|
+
|
|
7
|
+
## Input and typing
|
|
8
|
+
- Null/missing fields: None/null, empty string, empty collection
|
|
9
|
+
- Unexpected types: string-number mixing, boolean-integer confusion
|
|
10
|
+
- Oversized input: long strings, large arrays, deeply nested objects
|
|
11
|
+
- Encoding issues: UTF-8/non-ASCII, invisible characters
|
|
12
|
+
|
|
13
|
+
## Boundaries and numerics
|
|
14
|
+
- Off-by-one: index 0/1 and length boundaries
|
|
15
|
+
- Overflow/underflow: integer/timestamp boundaries
|
|
16
|
+
- NaN/Inf: floating-point special values
|
|
17
|
+
- Precision loss: money/ratio calculations
|
|
18
|
+
- Negative values where invalid
|
|
19
|
+
|
|
20
|
+
## Structure and ordering
|
|
21
|
+
- Duplicate elements: dedup/accumulation logic
|
|
22
|
+
- Ordering assumptions: sorting stability, input-order dependence
|
|
23
|
+
- Empty/singleton collections: reduce/min/max/avg behavior
|
|
24
|
+
- Mutable/immutable mismatch: in-place mutation of input data
|
|
25
|
+
|
|
26
|
+
## Exceptions and error handling
|
|
27
|
+
- Parsing failures: date/timezone, JSON, CSV
|
|
28
|
+
- External dependency failures: 429/500/timeout
|
|
29
|
+
- Swallowed errors: `except pass` or missing logs
|
|
30
|
+
- Recovery strategy: retry count, backoff, degradation
|
|
31
|
+
|
|
32
|
+
## State and side effects
|
|
33
|
+
- Reentrancy: same request invoked multiple times
|
|
34
|
+
- Global state contamination: cache/singleton bleed-through
|
|
35
|
+
- Mutable default parameters: Python list/dict defaults
|
|
36
|
+
- Resource release: file/connection not closed
|
|
37
|
+
|
|
38
|
+
## Security and validation
|
|
39
|
+
- Insufficient authorization behavior
|
|
40
|
+
- Validation bypass via null/0/False
|
|
41
|
+
- Path/injection risks from string concatenation
|
|
42
|
+
|
|
43
|
+
## Performance and limits
|
|
44
|
+
- N+1 query patterns inside loops
|
|
45
|
+
- Large-data stress: timeout/memory pressure
|
|
46
|
+
- Hotspots: lock contention under high-frequency calls
|
|
@@ -0,0 +1,106 @@
|
|
|
1
|
+
# docs-to-voice environment example
|
|
2
|
+
#
|
|
3
|
+
# MODE:
|
|
4
|
+
# - say: macOS built-in `say`
|
|
5
|
+
# - api: Alibaba Cloud Model Studio (Bailian) TTS API
|
|
6
|
+
DOCS_TO_VOICE_MODE="say"
|
|
7
|
+
|
|
8
|
+
# -----------------------------
|
|
9
|
+
# say mode defaults
|
|
10
|
+
# -----------------------------
|
|
11
|
+
# If --voice is provided, it overrides this value.
|
|
12
|
+
DOCS_TO_VOICE_VOICE="Eddy (中文(台灣))"
|
|
13
|
+
|
|
14
|
+
# -----------------------------
|
|
15
|
+
# api mode defaults (Model Studio)
|
|
16
|
+
# -----------------------------
|
|
17
|
+
# Required for mode=api.
|
|
18
|
+
DASHSCOPE_API_KEY=""
|
|
19
|
+
#
|
|
20
|
+
# Region endpoint (direct Model Studio API; choose ONE):
|
|
21
|
+
# - China Mainland (Beijing):
|
|
22
|
+
# https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
|
|
23
|
+
# - International (Singapore):
|
|
24
|
+
# https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
|
|
25
|
+
# - US (Virginia):
|
|
26
|
+
# https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
|
|
27
|
+
#
|
|
28
|
+
# NOTE:
|
|
29
|
+
# - API Key and endpoint region must match.
|
|
30
|
+
# - If you use OpenAI-compatible mode in other clients, base URL is:
|
|
31
|
+
# - https://dashscope.aliyuncs.com/compatible-mode/v1
|
|
32
|
+
# - https://dashscope-intl.aliyuncs.com/compatible-mode/v1
|
|
33
|
+
# - https://dashscope-us.aliyuncs.com/compatible-mode/v1
|
|
34
|
+
DOCS_TO_VOICE_API_ENDPOINT="https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation"
|
|
35
|
+
|
|
36
|
+
# Model options (choose one available in your region/account):
|
|
37
|
+
# - qwen3-tts
|
|
38
|
+
# - qwen3-tts-flash
|
|
39
|
+
# - qwen3-tts-instruct-flash
|
|
40
|
+
# - qwen-tts
|
|
41
|
+
# - qwen-tts-latest
|
|
42
|
+
DOCS_TO_VOICE_API_MODEL="qwen3-tts"
|
|
43
|
+
|
|
44
|
+
# Voice options (DOCS_TO_VOICE_API_VOICE) and style:
|
|
45
|
+
# - Cherry: 芊悅|陽光積極、親切自然小姐姐(女)
|
|
46
|
+
# - Serena: 蘇瑤|溫柔小姐姐(女)
|
|
47
|
+
# - Ethan: 晨煦|陽光溫暖、朝氣男聲(男)
|
|
48
|
+
# - Chelsie: 千雪|二次元虛擬女友(女)
|
|
49
|
+
# - Momo: 茉兔|撒嬌搞怪(女)
|
|
50
|
+
# - Vivian: 十三|拽拽可愛小暴躁(女)
|
|
51
|
+
# - Moon: 月白|率性帥氣(男)
|
|
52
|
+
# - Maia: 四月|知性溫柔(女)
|
|
53
|
+
# - Kai: 凱|通用中性
|
|
54
|
+
# - Nofish: 不吃魚|不會翹舌音的設計師(男)
|
|
55
|
+
# - Bella: 萌寶|小蘿莉感(女)
|
|
56
|
+
# - Jennifer: 詹妮弗|品牌感美語女聲(女)
|
|
57
|
+
# - Ryan: 甜茶|節奏感強、戲感足(男)
|
|
58
|
+
# - Katerina: 卡捷琳娜|御姐音色(女)
|
|
59
|
+
# - Aiden: 艾登|美語暖男(男)
|
|
60
|
+
# - Eldric Sage: 滄明子|沉穩睿智老者(男)
|
|
61
|
+
# - Mia: 乖小妹|溫順乖巧(女)
|
|
62
|
+
# - Mochi: 沙小彌|早慧童聲(男)
|
|
63
|
+
# - Bellona: 燕錚鶯|洪亮、戲劇張力強(女)
|
|
64
|
+
# - Vincent: 田叔|沙啞煙嗓(男)
|
|
65
|
+
# - Bunny: 萌小姬|萌系蘿莉(女)
|
|
66
|
+
# - Neil: 阿聞|新聞主持風(男)
|
|
67
|
+
# - Elias: 墨講師|知性講解(女)
|
|
68
|
+
# - Arthur: 徐大爺|質樸長者敘事(男)
|
|
69
|
+
# - Nini: 鄰家妹妹|甜妹黏糯感(女)
|
|
70
|
+
# - Ebona: 詭婆婆|驚悚詭譎(女)
|
|
71
|
+
# - Seren: 小婉|助眠舒緩(女)
|
|
72
|
+
# - Pip: 頑屁小孩|調皮童真(男)
|
|
73
|
+
# - Stella: 少女阿月|元氣少女(女)
|
|
74
|
+
# - Bodega: 博德加|熱情西語大叔(男)
|
|
75
|
+
# - Sonrisa: 索尼莎|熱情拉美女聲(女)
|
|
76
|
+
# - Alek: 阿列克|冷暖反差俄語男聲(男)
|
|
77
|
+
# - Dolce: 多爾切|慵懶義式大叔(男)
|
|
78
|
+
# - Sohee: 素熙|溫柔開朗韓系女聲(女)
|
|
79
|
+
# - Ono Anna: 小野杏|鬼靈精怪日系女聲(女)
|
|
80
|
+
# - Lenn: 萊恩|理性德系青年(男)
|
|
81
|
+
# - Emilien: 埃米爾安|法式浪漫男聲(男)
|
|
82
|
+
# - Andre: 安德雷|沉穩磁性男聲(男)
|
|
83
|
+
# - Radio Gol: 拉迪奧·戈爾|足球解說風
|
|
84
|
+
# - Jada: 上海-阿珍|上海話(女)
|
|
85
|
+
# - Dylan: 北京-曉東|北京話(男)
|
|
86
|
+
# - Li: 南京-老李|南京話(男)
|
|
87
|
+
# - Marcus: 陝西-秦川|陝西話(男)
|
|
88
|
+
# - Roy: 閩南-阿杰|閩南語(男)
|
|
89
|
+
# - Peter: 天津-李彼得|天津話相聲風(男)
|
|
90
|
+
# - Sunny: 四川-晴兒|四川話甜美女聲(女)
|
|
91
|
+
# - Eric: 四川-程川|四川話(男)
|
|
92
|
+
# - Rocky: 粵語-阿強|粵語(男)
|
|
93
|
+
# - Kiki: 粵語-阿清|粵語(女)
|
|
94
|
+
DOCS_TO_VOICE_API_VOICE="Cherry"
|
|
95
|
+
|
|
96
|
+
# Long text chunking:
|
|
97
|
+
# - Empty means api mode auto-discovers model max input length before chunking.
|
|
98
|
+
# - Set a positive integer to force a custom chunk limit.
|
|
99
|
+
# - 0 disables chunking.
|
|
100
|
+
DOCS_TO_VOICE_MAX_CHARS=""
|
|
101
|
+
|
|
102
|
+
# Optional post-process speech speed multiplier:
|
|
103
|
+
# - Empty keeps original speed.
|
|
104
|
+
# - Set a positive number (e.g. 1.2 faster, 0.8 slower).
|
|
105
|
+
# - Requires ffmpeg when value is not 1.
|
|
106
|
+
DOCS_TO_VOICE_SPEECH_RATE=""
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on Keep a Changelog.
|
|
6
|
+
|
|
7
|
+
## [Unreleased]
|
|
8
|
+
|
|
9
|
+
## [0.5.0] - 2026-02-18
|
|
10
|
+
|
|
11
|
+
### Added
|
|
12
|
+
- Added `--speech-rate` (and `DOCS_TO_VOICE_SPEECH_RATE`) for optional ffmpeg-based post-process speed adjustment.
|
|
13
|
+
- Added speech-rate unit tests for input validation, filter-chain generation, and timeline duration scaling.
|
|
14
|
+
|
|
15
|
+
### Fixed
|
|
16
|
+
- Rejected non-finite `--speech-rate` values (for example `nan`/`inf`) to avoid invalid ffmpeg filter generation.
|
|
17
|
+
|
|
18
|
+
## [0.4.0] - 2026-02-15
|
|
19
|
+
|
|
20
|
+
### Changed
|
|
21
|
+
- API mode now sends TTS requests sentence-by-sentence and still merges all generated audio into one final output file.
|
|
22
|
+
- API sentence timelines now prefer measured per-sentence audio durations to produce more precise subtitle timestamps.
|
|
23
|
+
|
|
24
|
+
### Added
|
|
25
|
+
- Added `split_text_into_api_sentence_requests` to keep sentence boundaries in API synthesis while still splitting only oversized sentences by `--max-chars`.
|
|
26
|
+
- Added timeline tests covering sentence-based API request splitting and sentence-duration timestamp output.
|
|
27
|
+
|
|
28
|
+
## [0.3.0] - 2026-02-15
|
|
29
|
+
|
|
30
|
+
### Added
|
|
31
|
+
- Added API-mode max-length discovery that first checks model catalog metadata and then probes API length validation when metadata is unavailable.
|
|
32
|
+
|
|
33
|
+
### Changed
|
|
34
|
+
- API mode now auto-splits long text by discovered max input length when `--max-chars` (or `DOCS_TO_VOICE_MAX_CHARS`) is not provided.
|
|
35
|
+
- API chunking now applies weighted input-unit counting for CJK text to match qwen3-tts length rules.
|
|
36
|
+
|
|
37
|
+
### Fixed
|
|
38
|
+
- Added explicit `0` handling for `--max-chars` / `DOCS_TO_VOICE_MAX_CHARS` so users can still disable chunking.
|
|
39
|
+
- Wrapped transient HTTP disconnect exceptions as user-facing CLI errors instead of raw tracebacks.
|
|
40
|
+
|
|
41
|
+
## [0.2.1] - 2026-02-15
|
|
42
|
+
|
|
43
|
+
### Added
|
|
44
|
+
- Added unit tests for `resolve_setting` to lock configuration precedence and fallback behavior.
|
|
45
|
+
|
|
46
|
+
### Fixed
|
|
47
|
+
- Fixed configuration precedence so omitted CLI mode/options now prefer `.env` values before shell environment variables.
|
|
48
|
+
- Prevented blank CLI values from incorrectly overriding `.env` settings.
|
|
49
|
+
|
|
50
|
+
## [0.2.0] - 2026-02-14
|
|
51
|
+
|
|
52
|
+
### Added
|
|
53
|
+
- Added `scripts/docs_to_voice.py` as the primary CLI implementation for both `say` and `api` modes.
|
|
54
|
+
- Added built-in Python API handling for Model Studio requests/responses (URL and base64 audio paths) to simplify API mode operations.
|
|
55
|
+
- Added long-text chunking via `--max-chars` and `DOCS_TO_VOICE_MAX_CHARS`, with automatic per-chunk synthesis and final audio concatenation.
|
|
56
|
+
|
|
57
|
+
### Changed
|
|
58
|
+
- Replaced `scripts/docs_to_voice.sh` logic with a lightweight compatibility wrapper that delegates to the Python CLI.
|
|
59
|
+
- Updated README, SKILL, `.env.example`, and agent prompt guidance to use the Python script as the default entrypoint and document chunked generation for long text.
|
|
60
|
+
|
|
61
|
+
## [0.1.0] - 2026-02-14
|
|
62
|
+
|
|
63
|
+
### Added
|
|
64
|
+
- Added `api` mode for Alibaba Cloud Model Studio TTS, including endpoint/model/voice configuration and API key support.
|
|
65
|
+
- Added sentence-level timeline outputs for each generated audio file: `.timeline.json` and `.srt`.
|
|
66
|
+
- Added timeline metadata with per-sentence start/end offsets (seconds and milliseconds) to support subtitle alignment.
|
|
67
|
+
|
|
68
|
+
### Changed
|
|
69
|
+
- Updated `.env.example` with full mode selection, endpoint guidance, and API voice/model defaults.
|
|
70
|
+
- Updated skill and README documentation to describe dual-mode generation and subtitle timeline artifacts.
|
|
71
|
+
- Updated agent default prompt to include timeline generation behavior.
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 tszkinlai
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,118 @@
|
|
|
1
|
+
# docs-to-voice
|
|
2
|
+
|
|
3
|
+
Convert text into voice files and always write outputs to:
|
|
4
|
+
|
|
5
|
+
`project_dir/audio/{project_name}/`
|
|
6
|
+
|
|
7
|
+
Each run also generates subtitle timeline files in the same folder:
|
|
8
|
+
|
|
9
|
+
- `{audio_name_without_extension}.timeline.json`: sentence-level subtitle start/end timestamps (seconds and milliseconds)
|
|
10
|
+
- `{audio_name_without_extension}.srt`: ready-to-use subtitle track file
|
|
11
|
+
|
|
12
|
+
Two modes are supported:
|
|
13
|
+
|
|
14
|
+
- `say`: built-in macOS `say`
|
|
15
|
+
- `api`: Alibaba Cloud Model Studio TTS API (for example `qwen3-tts`)
|
|
16
|
+
|
|
17
|
+
## Features
|
|
18
|
+
|
|
19
|
+
- Supports both `--text` and `--input-file`
|
|
20
|
+
- Supports `--mode say|api`
|
|
21
|
+
- Supports `.env` for API key/model/voice settings
|
|
22
|
+
- Supports `--output-name` for deterministic output filename
|
|
23
|
+
- `say` mode enables punctuation prosody enhancement by default (disable with `--no-auto-prosody`)
|
|
24
|
+
- `api` mode sends sentence-level TTS requests and merges output into one audio file
|
|
25
|
+
- `api` mode auto-detects model max input length and segments over-limit sentences
|
|
26
|
+
- `--max-chars` (or `.env`) can override segment length limit manually
|
|
27
|
+
- `--speech-rate` (or `.env`) can apply post-processing speech-rate adjustment (requires `ffmpeg`)
|
|
28
|
+
- API timeline prefers real per-sentence audio duration for higher subtitle precision
|
|
29
|
+
- Automatically outputs sentence-level timeline files (`.timeline.json` + `.srt`)
|
|
30
|
+
|
|
31
|
+
## Requirements
|
|
32
|
+
|
|
33
|
+
- `say` mode: macOS + `say` + `python3`
|
|
34
|
+
- `api` mode: `python3` + Model Studio API key
|
|
35
|
+
- Long-text merge workflow: `ffmpeg` recommended (especially for AIFF output)
|
|
36
|
+
|
|
37
|
+
## Quick start
|
|
38
|
+
|
|
39
|
+
### 1) say mode
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
python3 scripts/docs_to_voice.py \
|
|
43
|
+
--project-dir "/path/to/project" \
|
|
44
|
+
--mode say \
|
|
45
|
+
--text "Hello, this is a voice synthesis test."
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
### 2) api mode (Model Studio)
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
python3 scripts/docs_to_voice.py \
|
|
52
|
+
--project-dir "/path/to/project" \
|
|
53
|
+
--mode api \
|
|
54
|
+
--text "Hello, this is a qwen3-tts test."
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
> Compatibility note: `scripts/docs_to_voice.sh` still works and internally delegates to the Python script.
|
|
58
|
+
|
|
59
|
+
## `.env` settings
|
|
60
|
+
|
|
61
|
+
1. Copy template
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
cp .env.example .env
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
2. Configure mode and API parameters (example)
|
|
68
|
+
|
|
69
|
+
```env
|
|
70
|
+
DOCS_TO_VOICE_MODE="api"
|
|
71
|
+
DASHSCOPE_API_KEY="sk-xxx"
|
|
72
|
+
DOCS_TO_VOICE_API_ENDPOINT="https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation"
|
|
73
|
+
DOCS_TO_VOICE_API_MODEL="qwen3-tts"
|
|
74
|
+
DOCS_TO_VOICE_API_VOICE="Cherry"
|
|
75
|
+
DOCS_TO_VOICE_MAX_CHARS=""
|
|
76
|
+
DOCS_TO_VOICE_SPEECH_RATE=""
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
> CLI args `--mode`, `--api-model`, `--api-voice`, `--api-endpoint`, and `--speech-rate` override `.env`. If omitted, values are loaded from `.env` first, then fallback to same-name shell environment variables.
|
|
80
|
+
|
|
81
|
+
## Parameters
|
|
82
|
+
|
|
83
|
+
```text
|
|
84
|
+
--project-dir DIR required, project root path
|
|
85
|
+
--text TEXT choose one of --text / --input-file
|
|
86
|
+
--input-file FILE choose one of --text / --input-file
|
|
87
|
+
--project-name NAME optional, defaults to project_dir folder name
|
|
88
|
+
--output-name NAME optional, defaults to voice-YYYYmmdd-HHMMSS + mode extension
|
|
89
|
+
--env-file FILE optional, default: skill-folder/.env
|
|
90
|
+
--mode MODE optional, say or api
|
|
91
|
+
--voice NAME optional for say mode
|
|
92
|
+
--rate N optional for say mode
|
|
93
|
+
--speech-rate N optional, speech-rate multiplier (>0; 1.2=faster, 0.8=slower; requires ffmpeg)
|
|
94
|
+
--api-endpoint URL optional for api mode (Model Studio endpoint)
|
|
95
|
+
--api-model NAME optional for api mode
|
|
96
|
+
--api-voice NAME optional for api mode
|
|
97
|
+
--max-chars N optional, manual segment limit (api mode auto-detects if omitted; 0 means no segmentation)
|
|
98
|
+
--no-auto-prosody optional for say mode, disable punctuation prosody enhancement
|
|
99
|
+
--force optional, overwrite existing files
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
## Long-text notes
|
|
103
|
+
|
|
104
|
+
- API mode uses sentence-level TTS by default and only segments when a sentence exceeds model limits.
|
|
105
|
+
- If you already know a suitable limit, set `--max-chars` (or `DOCS_TO_VOICE_MAX_CHARS` in `.env`).
|
|
106
|
+
- For many `qwen3-tts` models, CJK chars typically count as 2 units and other chars as 1 unit; this script segments using that convention.
|
|
107
|
+
|
|
108
|
+
## Endpoint note
|
|
109
|
+
|
|
110
|
+
If you need direct Model Studio `qwen3-tts` integration, use:
|
|
111
|
+
|
|
112
|
+
`https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation`
|
|
113
|
+
|
|
114
|
+
`https://dashscope-intl.aliyuncs.com/compatible-mode/v1` is the OpenAI-compatible base URL (typically for chat/completions), not the default direct Model Studio TTS endpoint used by this script.
|
|
115
|
+
|
|
116
|
+
## License
|
|
117
|
+
|
|
118
|
+
MIT License (see `LICENSE`).
|
|
@@ -0,0 +1,107 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: docs-to-voice
|
|
3
|
+
description: Convert text and document content into audio files and sentence-level subtitle timelines under project_dir/audio/{project_name}/. Supports both macOS say and Alibaba Cloud Model Studio API modes.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Docs to Voice
|
|
7
|
+
|
|
8
|
+
## Dependencies
|
|
9
|
+
|
|
10
|
+
- Required: none.
|
|
11
|
+
- Conditional: none.
|
|
12
|
+
- Optional: none.
|
|
13
|
+
- Fallback: not applicable.
|
|
14
|
+
|
|
15
|
+
## Standards
|
|
16
|
+
|
|
17
|
+
- Evidence: Confirm `project_dir`, input source, mode, and environment-backed settings before generation.
|
|
18
|
+
- Execution: Use `scripts/docs_to_voice.py` to write audio plus matching timeline and subtitle files under `project_dir/audio/{project_name}/`.
|
|
19
|
+
- Quality: Respect mode-specific options, sentence splitting rules, and post-process requirements such as `ffmpeg` for speed changes.
|
|
20
|
+
- Output: Return the absolute output audio path together with the generated `.timeline.json` and `.srt` companions.
|
|
21
|
+
|
|
22
|
+
## Overview
|
|
23
|
+
|
|
24
|
+
Use `scripts/docs_to_voice.py` to convert raw text or text files into audio and always save under:
|
|
25
|
+
|
|
26
|
+
`project_dir/audio/{project_name}/`
|
|
27
|
+
|
|
28
|
+
Alongside each audio file, the script also writes:
|
|
29
|
+
|
|
30
|
+
- `{audio_name_without_extension}.timeline.json`
|
|
31
|
+
- `{audio_name_without_extension}.srt`
|
|
32
|
+
|
|
33
|
+
Modes:
|
|
34
|
+
|
|
35
|
+
- `say`: local macOS `say`
|
|
36
|
+
- `api`: Alibaba Cloud Model Studio TTS API (for example `qwen3-tts`)
|
|
37
|
+
|
|
38
|
+
## Workflow
|
|
39
|
+
|
|
40
|
+
1. Collect inputs.
|
|
41
|
+
- Require `project_dir`.
|
|
42
|
+
- Accept either raw text or one input text file.
|
|
43
|
+
- Set `project_name`; default to basename of `project_dir`.
|
|
44
|
+
|
|
45
|
+
2. Select mode.
|
|
46
|
+
- `--mode say` for local generation.
|
|
47
|
+
- `--mode api` for Model Studio API generation.
|
|
48
|
+
- If omitted, load `DOCS_TO_VOICE_MODE` from `.env`, then shell environment variables; fallback `say`.
|
|
49
|
+
|
|
50
|
+
3. Prepare output path.
|
|
51
|
+
- Build `project_dir/audio/{project_name}/`.
|
|
52
|
+
- Create directory if it does not exist.
|
|
53
|
+
|
|
54
|
+
4. Generate audio.
|
|
55
|
+
- `say` mode supports `--voice`, `--rate`, and punctuation-pause enhancement.
|
|
56
|
+
- `api` mode supports `--api-endpoint`, `--api-model`, `--api-voice`, and reads `DASHSCOPE_API_KEY`.
|
|
57
|
+
- `api` mode sends one request per sentence and concatenates all sentence audio into one final file.
|
|
58
|
+
- `api` mode auto discovers model max input length; only oversized sentences are split by that limit.
|
|
59
|
+
- `--max-chars` (or `DOCS_TO_VOICE_MAX_CHARS`) can override the sentence split limit; `0` disables chunking.
|
|
60
|
+
- `--speech-rate` (or `DOCS_TO_VOICE_SPEECH_RATE`) applies optional post-process speed adjustment and requires `ffmpeg` when value is not `1`.
|
|
61
|
+
- API splitting uses model counting rules (for `qwen3-tts`, CJK chars count as 2 units).
|
|
62
|
+
|
|
63
|
+
5. Generate sentence-level timeline files.
|
|
64
|
+
- Write JSON timeline and SRT subtitle files next to audio output.
|
|
65
|
+
- In `api` mode, timeline start/end uses per-sentence audio durations whenever available.
|
|
66
|
+
|
|
67
|
+
6. Return completion details.
|
|
68
|
+
- Report absolute output audio path.
|
|
69
|
+
|
|
70
|
+
## Script Reference
|
|
71
|
+
|
|
72
|
+
`scripts/docs_to_voice.py` flags:
|
|
73
|
+
|
|
74
|
+
- `--project-dir` (required)
|
|
75
|
+
- `--project-name` (optional)
|
|
76
|
+
- `--text` or `--input-file` (exactly one required)
|
|
77
|
+
- `--env-file` (optional, default: `skill_dir/.env`)
|
|
78
|
+
- `--mode` (`say|api`, optional)
|
|
79
|
+
- `--voice` (optional, say mode)
|
|
80
|
+
- `--rate` (optional, say mode)
|
|
81
|
+
- `--speech-rate` (optional, post-process speed multiplier)
|
|
82
|
+
- `--api-endpoint` (optional, api mode)
|
|
83
|
+
- `--api-model` (optional, api mode)
|
|
84
|
+
- `--api-voice` (optional, api mode)
|
|
85
|
+
- `--max-chars` (optional, auto chunking threshold for long text)
|
|
86
|
+
- `--output-name` (optional)
|
|
87
|
+
- `--no-auto-prosody` (optional, say mode)
|
|
88
|
+
- `--force` (optional)
|
|
89
|
+
|
|
90
|
+
Environment variables:
|
|
91
|
+
|
|
92
|
+
- `DOCS_TO_VOICE_MODE`
|
|
93
|
+
- `DOCS_TO_VOICE_VOICE`
|
|
94
|
+
- `DOCS_TO_VOICE_API_ENDPOINT`
|
|
95
|
+
- `DOCS_TO_VOICE_API_MODEL`
|
|
96
|
+
- `DOCS_TO_VOICE_API_VOICE`
|
|
97
|
+
- `DOCS_TO_VOICE_MAX_CHARS`
|
|
98
|
+
- `DOCS_TO_VOICE_SPEECH_RATE`
|
|
99
|
+
- `DASHSCOPE_API_KEY`
|
|
100
|
+
|
|
101
|
+
## Troubleshooting
|
|
102
|
+
|
|
103
|
+
- `say` mode: confirm `command -v say` and `command -v python3`.
|
|
104
|
+
- `api` mode: confirm `command -v python3` and valid `DASHSCOPE_API_KEY`.
|
|
105
|
+
- Long-text chunk merge (especially AIFF output): recommend `command -v ffmpeg`.
|
|
106
|
+
- If output exists, use `--force` or a new `--output-name`.
|
|
107
|
+
- `scripts/docs_to_voice.sh` is kept as a compatibility wrapper for existing workflows.
|
|
@@ -0,0 +1,4 @@
|
|
|
1
|
+
interface:
|
|
2
|
+
display_name: "Docs to Voice"
|
|
3
|
+
short_description: "Convert text into voice files with macOS say or Model Studio API"
|
|
4
|
+
default_prompt: "Use $docs-to-voice to convert text into an audio file saved under project_dir/audio/{project_name}/, generate sentence timelines (.timeline.json and .srt), synthesize API mode sentence-by-sentence, and concatenate all generated parts into one final audio file."
|