agent-cli 0.70.2__py3-none-any.whl → 0.72.1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. agent_cli/_extras.json +4 -3
  2. agent_cli/_requirements/memory.txt +14 -1
  3. agent_cli/_requirements/rag.txt +14 -1
  4. agent_cli/_requirements/vad.txt +1 -85
  5. agent_cli/_requirements/wyoming.txt +71 -0
  6. agent_cli/agents/assistant.py +24 -28
  7. agent_cli/agents/autocorrect.py +30 -4
  8. agent_cli/agents/chat.py +45 -15
  9. agent_cli/agents/memory/__init__.py +19 -1
  10. agent_cli/agents/memory/add.py +3 -3
  11. agent_cli/agents/memory/proxy.py +20 -11
  12. agent_cli/agents/rag_proxy.py +42 -10
  13. agent_cli/agents/speak.py +23 -3
  14. agent_cli/agents/transcribe.py +21 -3
  15. agent_cli/agents/transcribe_daemon.py +34 -22
  16. agent_cli/agents/voice_edit.py +18 -10
  17. agent_cli/cli.py +25 -2
  18. agent_cli/config_cmd.py +30 -11
  19. agent_cli/core/deps.py +6 -3
  20. agent_cli/core/transcription_logger.py +1 -1
  21. agent_cli/core/vad.py +6 -24
  22. agent_cli/dev/cli.py +295 -65
  23. agent_cli/docs_gen.py +18 -8
  24. agent_cli/install/extras.py +44 -13
  25. agent_cli/install/hotkeys.py +22 -11
  26. agent_cli/install/services.py +54 -14
  27. agent_cli/opts.py +43 -22
  28. agent_cli/server/cli.py +128 -62
  29. agent_cli/server/proxy/api.py +77 -19
  30. agent_cli/services/__init__.py +46 -5
  31. {agent_cli-0.70.2.dist-info → agent_cli-0.72.1.dist-info}/METADATA +627 -246
  32. {agent_cli-0.70.2.dist-info → agent_cli-0.72.1.dist-info}/RECORD +35 -34
  33. {agent_cli-0.70.2.dist-info → agent_cli-0.72.1.dist-info}/WHEEL +0 -0
  34. {agent_cli-0.70.2.dist-info → agent_cli-0.72.1.dist-info}/entry_points.txt +0 -0
  35. {agent_cli-0.70.2.dist-info → agent_cli-0.72.1.dist-info}/licenses/LICENSE +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: agent-cli
3
- Version: 0.70.2
3
+ Version: 0.72.1
4
4
  Summary: A suite of AI-powered command-line tools for text correction, audio transcription, and voice assistance.
5
5
  Project-URL: Homepage, https://github.com/basnijholt/agent-cli
6
6
  Author-email: Bas Nijholt <bas@nijho.lt>
@@ -51,6 +51,7 @@ Requires-Dist: chromadb>=0.4.22; extra == 'memory'
51
51
  Requires-Dist: fastapi[standard]; extra == 'memory'
52
52
  Requires-Dist: huggingface-hub>=0.20.0; extra == 'memory'
53
53
  Requires-Dist: onnxruntime>=1.17.0; extra == 'memory'
54
+ Requires-Dist: openai>=1.0.0; extra == 'memory'
54
55
  Requires-Dist: pyyaml>=6.0.0; extra == 'memory'
55
56
  Requires-Dist: transformers>=4.30.0; extra == 'memory'
56
57
  Requires-Dist: watchfiles>=0.21.0; extra == 'memory'
@@ -66,6 +67,7 @@ Requires-Dist: fastapi[standard]; extra == 'rag'
66
67
  Requires-Dist: huggingface-hub>=0.20.0; extra == 'rag'
67
68
  Requires-Dist: markitdown[docx,pdf,pptx]>=0.1.3; extra == 'rag'
68
69
  Requires-Dist: onnxruntime>=1.17.0; extra == 'rag'
70
+ Requires-Dist: openai>=1.0.0; extra == 'rag'
69
71
  Requires-Dist: transformers>=4.30.0; extra == 'rag'
70
72
  Requires-Dist: watchfiles>=0.21.0; extra == 'rag'
71
73
  Provides-Extra: server
@@ -79,7 +81,9 @@ Requires-Dist: pytest-mock; extra == 'test'
79
81
  Requires-Dist: pytest-timeout; extra == 'test'
80
82
  Requires-Dist: pytest>=7.0.0; extra == 'test'
81
83
  Provides-Extra: vad
82
- Requires-Dist: silero-vad>=5.1; extra == 'vad'
84
+ Requires-Dist: silero-vad-lite>=0.2.1; extra == 'vad'
85
+ Provides-Extra: wyoming
86
+ Requires-Dist: wyoming>=1.5.2; extra == 'wyoming'
83
87
  Description-Content-Type: text/markdown
84
88
 
85
89
  # Agent CLI
@@ -132,7 +136,7 @@ Since then I have expanded the tool with many more features, all focused on loca
132
136
  - **[`memory`](docs/commands/memory.md)**: Long-term memory system with `memory proxy` and `memory add`.
133
137
  - **[`rag-proxy`](docs/commands/rag-proxy.md)**: RAG proxy server for chatting with your documents.
134
138
  - **[`dev`](docs/commands/dev.md)**: Parallel development with git worktrees and AI coding agents.
135
- - **[`server`](docs/commands/server/index.md)**: Local ASR and TTS servers with dual-protocol (Wyoming & OpenAI), TTL-based memory management, and multi-platform acceleration. Whisper uses MLX on Apple Silicon or Faster Whisper on Linux/CUDA. TTS supports Kokoro (GPU) or Piper (CPU).
139
+ - **[`server`](docs/commands/server/index.md)**: Local ASR and TTS servers with dual-protocol (Wyoming & OpenAI-compatible APIs), TTL-based memory management, and multi-platform acceleration. Whisper uses MLX on Apple Silicon or Faster Whisper on Linux/CUDA. TTS supports Kokoro (GPU) or Piper (CPU).
136
140
  - **[`transcribe-daemon`](docs/commands/transcribe-daemon.md)**: Continuous background transcription with VAD. Install with `uv tool install "agent-cli[vad]" -p 3.13`.
137
141
 
138
142
  ## Quick Start
@@ -496,21 +500,43 @@ agent-cli install-extras rag memory vad
496
500
 
497
501
  Usage: agent-cli install-extras [OPTIONS] [EXTRAS]...
498
502
 
499
- Install optional extras (rag, memory, vad, etc.) with pinned versions.
503
+ Install optional dependencies with pinned, compatible versions.
504
+
505
+ Many agent-cli features require optional dependencies. This command installs them with
506
+ version pinning to ensure compatibility. Dependencies persist across uv tool upgrade
507
+ when installed via uv tool.
508
+
509
+ Available extras:
510
+
511
+ • rag - RAG proxy server (ChromaDB, embeddings)
512
+ • memory - Long-term memory proxy (ChromaDB)
513
+ • vad - Voice Activity Detection (silero-vad)
514
+ • audio - Local audio recording/playback
515
+ • piper - Local Piper TTS engine
516
+ • kokoro - Kokoro neural TTS engine
517
+ • faster-whisper - Whisper ASR for CUDA/CPU
518
+ • mlx-whisper - Whisper ASR for Apple Silicon
519
+ • wyoming - Wyoming protocol for ASR/TTS servers
520
+ • server - FastAPI server components
521
+ • speed - Audio speed adjustment
522
+ • llm - LLM framework (pydantic-ai)
500
523
 
501
524
  Examples:
502
525
 
503
- • agent-cli install-extras rag # Install RAG dependencies
504
- agent-cli install-extras memory vad # Install multiple extras
505
- agent-cli install-extras --list # Show available extras
506
- agent-cli install-extras --all # Install all extras
526
+
527
+ agent-cli install-extras rag # Install RAG dependencies
528
+ agent-cli install-extras memory vad # Install multiple extras
529
+ agent-cli install-extras --list # Show available extras
530
+ agent-cli install-extras --all # Install all extras
531
+
507
532
 
508
533
  ╭─ Arguments ────────────────────────────────────────────────────────────────────────────╮
509
- │ extras [EXTRAS]... Extras to install
534
+ │ extras [EXTRAS]... Extras to install: rag, memory, vad, audio, piper, kokoro,
535
+ │ faster-whisper, mlx-whisper, wyoming, server, speed, llm │
510
536
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
511
537
  ╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
512
- │ --list -l List available extras
513
- │ --all -a Install all available extras
538
+ │ --list -l Show available extras with descriptions (what each one enables)
539
+ │ --all -a Install all available extras at once
514
540
  │ --help -h Show this message and exit. │
515
541
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
516
542
 
@@ -569,13 +595,21 @@ agent-cli config edit
569
595
 
570
596
  Manage agent-cli configuration files.
571
597
 
598
+ Config files are TOML format and searched in order:
599
+
600
+ 1 ./agent-cli-config.toml (project-local)
601
+ 2 ~/.config/agent-cli/config.toml (user default)
602
+
603
+ Settings in [defaults] apply to all commands. Override per-command with sections like
604
+ [chat] or [transcribe]. CLI arguments override config file settings.
605
+
572
606
  ╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
573
607
  │ --help -h Show this message and exit. │
574
608
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
575
609
  ╭─ Commands ─────────────────────────────────────────────────────────────────────────────╮
576
- │ init Create a new config file with all options commented out.
610
+ │ init Create a new config file with all options as commented-out examples.
577
611
  │ edit Open the config file in your default editor. │
578
- │ show Display the config file location and contents.
612
+ │ show Display the active config file path and contents.
579
613
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
580
614
 
581
615
  ```
@@ -633,27 +667,58 @@ the `[defaults]` section of your configuration file.
633
667
 
634
668
  Usage: agent-cli autocorrect [OPTIONS] [TEXT]
635
669
 
636
- Correct text from clipboard using a local or remote LLM.
670
+ Fix grammar, spelling, and punctuation using an LLM.
671
+
672
+ Reads text from clipboard (or argument), sends to LLM for correction, and copies the
673
+ result back to clipboard. Only makes technical corrections without changing meaning or
674
+ tone.
675
+
676
+ Workflow:
677
+
678
+ 1 Read text from clipboard (or TEXT argument)
679
+ 2 Send to LLM for grammar/spelling/punctuation fixes
680
+ 3 Copy corrected text to clipboard (unless --json)
681
+ 4 Display result
682
+
683
+ Examples:
684
+
685
+
686
+ # Correct text from clipboard (default)
687
+ agent-cli autocorrect
688
+
689
+ # Correct specific text
690
+ agent-cli autocorrect "this is incorect"
691
+
692
+ # Use OpenAI instead of local Ollama
693
+ agent-cli autocorrect --llm-provider openai
694
+
695
+ # Get JSON output for scripting (disables clipboard)
696
+ agent-cli autocorrect --json
697
+
637
698
 
638
699
  ╭─ General Options ──────────────────────────────────────────────────────────────────────╮
639
- │ text [TEXT] The text to correct. If not provided, reads from clipboard.
700
+ │ text [TEXT] Text to correct. If omitted, reads from system clipboard.
640
701
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
641
702
  ╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
642
703
  │ --help -h Show this message and exit. │
643
704
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
644
705
  ╭─ Provider Selection ───────────────────────────────────────────────────────────────────╮
645
706
  │ --llm-provider TEXT The LLM provider to use ('ollama', 'openai', 'gemini'). │
707
+ │ [env var: LLM_PROVIDER] │
646
708
  │ [default: ollama] │
647
709
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
648
710
  ╭─ LLM: Ollama ──────────────────────────────────────────────────────────────────────────╮
649
711
  │ --llm-ollama-model TEXT The Ollama model to use. Default is gemma3:4b. │
712
+ │ [env var: LLM_OLLAMA_MODEL] │
650
713
  │ [default: gemma3:4b] │
651
714
  │ --llm-ollama-host TEXT The Ollama server host. Default is │
652
715
  │ http://localhost:11434. │
716
+ │ [env var: LLM_OLLAMA_HOST] │
653
717
  │ [default: http://localhost:11434] │
654
718
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
655
719
  ╭─ LLM: OpenAI-compatible ───────────────────────────────────────────────────────────────╮
656
720
  │ --llm-openai-model TEXT The OpenAI model to use for LLM tasks. │
721
+ │ [env var: LLM_OPENAI_MODEL] │
657
722
  │ [default: gpt-5-mini] │
658
723
  │ --openai-api-key TEXT Your OpenAI API key. Can also be set with the │
659
724
  │ OPENAI_API_KEY environment variable. │
@@ -664,21 +729,24 @@ the `[defaults]` section of your configuration file.
664
729
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
665
730
  ╭─ LLM: Gemini ──────────────────────────────────────────────────────────────────────────╮
666
731
  │ --llm-gemini-model TEXT The Gemini model to use for LLM tasks. │
732
+ │ [env var: LLM_GEMINI_MODEL] │
667
733
  │ [default: gemini-3-flash-preview] │
668
734
  │ --gemini-api-key TEXT Your Gemini API key. Can also be set with the │
669
735
  │ GEMINI_API_KEY environment variable. │
670
736
  │ [env var: GEMINI_API_KEY] │
671
737
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
672
738
  ╭─ General Options ──────────────────────────────────────────────────────────────────────╮
673
- │ --log-level TEXT Set logging level.
674
- [default: WARNING]
675
- --log-file TEXT Path to a file to write logs to.
676
- │ --quiet -q Suppress console output from rich.
677
- │ --json Output result as JSON for automation. Implies --quiet and
678
- --no-clipboard.
679
- --config TEXT Path to a TOML configuration file.
680
- │ --print-args Print the command line arguments, including variables
681
- taken from the configuration file.
739
+ │ --log-level [debug|info|warning|error] Set logging level.
740
+ [env var: LOG_LEVEL]
741
+ [default: warning]
742
+ │ --log-file TEXT Path to a file to write logs to.
743
+ │ --quiet -q Suppress console output from rich.
744
+ --json Output result as JSON (implies
745
+ --quiet and --no-clipboard).
746
+ │ --config TEXT Path to a TOML configuration file.
747
+ --print-args Print the command line arguments,
748
+ │ including variables taken from the │
749
+ │ configuration file. │
682
750
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
683
751
 
684
752
  ```
@@ -722,71 +790,104 @@ the `[defaults]` section of your configuration file.
722
790
 
723
791
  Usage: agent-cli transcribe [OPTIONS]
724
792
 
725
- Wyoming ASR Client for streaming microphone audio to a transcription server.
793
+ Record audio from microphone and transcribe to text.
794
+
795
+ Records until you press Ctrl+C (or send SIGINT), then transcribes using your configured
796
+ ASR provider. The transcript is copied to the clipboard by default.
797
+
798
+ With --llm: Passes the raw transcript through an LLM to clean up speech recognition
799
+ errors, add punctuation, remove filler words, and improve readability.
800
+
801
+ With --toggle: Bind to a hotkey for push-to-talk. First call starts recording, second
802
+ call stops and transcribes.
803
+
804
+ Examples:
805
+
806
+ • Record and transcribe: agent-cli transcribe
807
+ • With LLM cleanup: agent-cli transcribe --llm
808
+ • Re-transcribe last recording: agent-cli transcribe --last-recording 1
726
809
 
727
810
  ╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
728
811
  │ --help -h Show this message and exit. │
729
812
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
730
813
  ╭─ LLM Configuration ────────────────────────────────────────────────────────────────────╮
731
- │ --extra-instructions TEXT Additional instructions for the LLM to
732
- process the transcription.
733
- │ --llm --no-llm Use an LLM to process the transcript.
814
+ │ --extra-instructions TEXT Extra instructions appended to the LLM │
815
+ cleanup prompt (requires --llm).
816
+ │ --llm --no-llm Clean up transcript with LLM: fix errors,
817
+ │ add punctuation, remove filler words. Uses │
818
+ │ --extra-instructions if set (via CLI or │
819
+ │ config file). │
734
820
  │ [default: no-llm] │
735
821
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
736
822
  ╭─ Audio Recovery ───────────────────────────────────────────────────────────────────────╮
737
- │ --from-file PATH Transcribe audio from a file
738
- (supports wav, mp3, m4a, ogg,
739
- │ flac, aac, webm). Requires ffmpeg
740
- │ for non-WAV formats with Wyoming
741
- provider.
742
- │ --last-recording INTEGER Transcribe a saved recording. Use
743
- │ 1 for most recent, 2 for
744
- second-to-last, etc. Use 0 to
745
- disable (default).
823
+ │ --from-file PATH Transcribe from audio file instead
824
+ of microphone. Supports wav, mp3,
825
+ m4a, ogg, flac, aac, webm.
826
+ Requires ffmpeg for non-WAV
827
+ formats with Wyoming.
828
+ │ --last-recording INTEGER Re-transcribe a saved recording
829
+ (1=most recent, 2=second-to-last,
830
+ │ etc). Useful after connection
831
+ failures or to retry with
832
+ │ different options. │
746
833
  │ [default: 0] │
747
- │ --save-recording --no-save-recording Save the audio recording to disk
748
- │ for recovery.
834
+ │ --save-recording --no-save-recording Save recordings to
835
+ ~/.cache/agent-cli/ for
836
+ │ --last-recording recovery. │
749
837
  │ [default: save-recording] │
750
838
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
751
839
  ╭─ Provider Selection ───────────────────────────────────────────────────────────────────╮
752
840
  │ --asr-provider TEXT The ASR provider to use ('wyoming', 'openai', 'gemini'). │
841
+ │ [env var: ASR_PROVIDER] │
753
842
  │ [default: wyoming] │
754
843
  │ --llm-provider TEXT The LLM provider to use ('ollama', 'openai', 'gemini'). │
844
+ │ [env var: LLM_PROVIDER] │
755
845
  │ [default: ollama] │
756
846
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
757
847
  ╭─ Audio Input ──────────────────────────────────────────────────────────────────────────╮
758
- │ --input-device-index INTEGER Index of the audio input device to use.
759
- --input-device-name TEXT Device name keywords for partial matching.
760
- │ --list-devices List available audio input and output devices and
761
- exit.
848
+ │ --input-device-index INTEGER Audio input device index (see --list-devices).
849
+ Uses system default if omitted.
850
+ │ --input-device-name TEXT Select input device by name substring (e.g.,
851
+ MacBook or USB).
852
+ │ --list-devices List available audio devices with their indices │
853
+ │ and exit. │
762
854
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
763
855
  ╭─ Audio Input: Wyoming ─────────────────────────────────────────────────────────────────╮
764
856
  │ --asr-wyoming-ip TEXT Wyoming ASR server IP address. │
857
+ │ [env var: ASR_WYOMING_IP] │
765
858
  │ [default: localhost] │
766
859
  │ --asr-wyoming-port INTEGER Wyoming ASR server port. │
860
+ │ [env var: ASR_WYOMING_PORT] │
767
861
  │ [default: 10300] │
768
862
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
769
863
  ╭─ Audio Input: OpenAI-compatible ───────────────────────────────────────────────────────╮
770
864
  │ --asr-openai-model TEXT The OpenAI model to use for ASR (transcription). │
865
+ │ [env var: ASR_OPENAI_MODEL] │
771
866
  │ [default: whisper-1] │
772
867
  │ --asr-openai-base-url TEXT Custom base URL for OpenAI-compatible ASR API │
773
868
  │ (e.g., for custom Whisper server: │
774
869
  │ http://localhost:9898). │
870
+ │ [env var: ASR_OPENAI_BASE_URL] │
775
871
  │ --asr-openai-prompt TEXT Custom prompt to guide transcription (optional). │
872
+ │ [env var: ASR_OPENAI_PROMPT] │
776
873
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
777
874
  ╭─ Audio Input: Gemini ──────────────────────────────────────────────────────────────────╮
778
875
  │ --asr-gemini-model TEXT The Gemini model to use for ASR (transcription). │
876
+ │ [env var: ASR_GEMINI_MODEL] │
779
877
  │ [default: gemini-3-flash-preview] │
780
878
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
781
879
  ╭─ LLM: Ollama ──────────────────────────────────────────────────────────────────────────╮
782
880
  │ --llm-ollama-model TEXT The Ollama model to use. Default is gemma3:4b. │
881
+ │ [env var: LLM_OLLAMA_MODEL] │
783
882
  │ [default: gemma3:4b] │
784
883
  │ --llm-ollama-host TEXT The Ollama server host. Default is │
785
884
  │ http://localhost:11434. │
885
+ │ [env var: LLM_OLLAMA_HOST] │
786
886
  │ [default: http://localhost:11434] │
787
887
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
788
888
  ╭─ LLM: OpenAI-compatible ───────────────────────────────────────────────────────────────╮
789
889
  │ --llm-openai-model TEXT The OpenAI model to use for LLM tasks. │
890
+ │ [env var: LLM_OPENAI_MODEL] │
790
891
  │ [default: gpt-5-mini] │
791
892
  │ --openai-api-key TEXT Your OpenAI API key. Can also be set with the │
792
893
  │ OPENAI_API_KEY environment variable. │
@@ -797,33 +898,45 @@ the `[defaults]` section of your configuration file.
797
898
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
798
899
  ╭─ LLM: Gemini ──────────────────────────────────────────────────────────────────────────╮
799
900
  │ --llm-gemini-model TEXT The Gemini model to use for LLM tasks. │
901
+ │ [env var: LLM_GEMINI_MODEL] │
800
902
  │ [default: gemini-3-flash-preview] │
801
903
  │ --gemini-api-key TEXT Your Gemini API key. Can also be set with the │
802
904
  │ GEMINI_API_KEY environment variable. │
803
905
  │ [env var: GEMINI_API_KEY] │
804
906
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
805
907
  ╭─ Process Management ───────────────────────────────────────────────────────────────────╮
806
- │ --stop Stop any running background process.
807
- │ --status Check if a background process is running.
808
- │ --toggle Toggle the background process on/off. If the process is running, it
809
- │ will be stopped. If the process is not running, it will be started. │
908
+ │ --stop Stop any running instance of this command.
909
+ │ --status Check if an instance is currently running.
910
+ │ --toggle Start if not running, stop if running. Ideal for hotkey binding.
810
911
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
811
912
  ╭─ General Options ──────────────────────────────────────────────────────────────────────╮
812
- │ --clipboard --no-clipboard Copy result to clipboard.
813
- [default: clipboard]
814
- --log-level TEXT Set logging level.
815
- [default: WARNING]
816
- --log-file TEXT Path to a file to write logs to.
817
- --quiet -q Suppress console output from rich.
818
- │ --json Output result as JSON for automation.
819
- Implies --quiet and --no-clipboard.
820
- │ --config TEXT Path to a TOML configuration file.
821
- --print-args Print the command line arguments,
822
- including variables taken from the
823
- configuration file.
824
- --transcription-log PATH Path to log transcription results
825
- with timestamps, hostname, model, and
826
- raw output.
913
+ │ --clipboard --no-clipboard Copy result to
914
+ clipboard.
915
+ [default: clipboard]
916
+ --log-level [debug|info|warning| Set logging level.
917
+ error] [env var: LOG_LEVEL]
918
+ [default: warning]
919
+ │ --log-file TEXT Path to a file to
920
+ write logs to.
921
+ │ --quiet -q Suppress console
922
+ output from rich.
923
+ --json Output result as JSON
924
+ (implies --quiet and
925
+ --no-clipboard).
926
+ --config TEXT Path to a TOML
927
+ configuration file.
928
+ │ --print-args Print the command │
929
+ │ line arguments, │
930
+ │ including variables │
931
+ │ taken from the │
932
+ │ configuration file. │
933
+ │ --transcription-log PATH Append transcripts to │
934
+ │ JSONL file │
935
+ │ (timestamp, hostname, │
936
+ │ model, raw/processed │
937
+ │ text). Recent entries │
938
+ │ provide context for │
939
+ │ LLM cleanup. │
827
940
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
828
941
 
829
942
  ```
@@ -879,88 +992,131 @@ uv tool install "agent-cli[vad]" -p 3.13
879
992
 
880
993
  Usage: agent-cli transcribe-daemon [OPTIONS]
881
994
 
882
- Run a continuous transcription daemon with voice activity detection.
995
+ Continuous transcription daemon using Silero VAD for speech detection.
883
996
 
884
- This command runs indefinitely, capturing audio from your microphone, detecting speech
885
- segments using Silero VAD, transcribing them, and logging results with timestamps.
997
+ Unlike transcribe (single recording session), this daemon runs indefinitely and
998
+ automatically detects speech segments using Voice Activity Detection (VAD). Each
999
+ detected segment is transcribed and logged with timestamps.
886
1000
 
887
- Examples: # Basic daemon agent-cli transcribe-daemon
1001
+ How it works:
1002
+
1003
+ 1 Listens continuously to microphone input
1004
+ 2 Silero VAD detects when you start/stop speaking
1005
+ 3 After --silence-threshold seconds of silence, the segment is finalized
1006
+ 4 Segment is transcribed (and optionally cleaned by LLM with --llm)
1007
+ 5 Results are appended to the JSONL log file
1008
+ 6 Audio is saved as MP3 if --save-audio is enabled (requires ffmpeg)
888
1009
 
1010
+ Use cases: Meeting transcription, note-taking, voice journaling, accessibility.
889
1011
 
890
- # With role and custom silence threshold
1012
+ Examples:
1013
+
1014
+
1015
+ agent-cli transcribe-daemon
891
1016
  agent-cli transcribe-daemon --role meeting --silence-threshold 1.5
1017
+ agent-cli transcribe-daemon --llm --clipboard --role notes
1018
+ agent-cli transcribe-daemon --transcription-log ~/meeting.jsonl --no-save-audio
1019
+ agent-cli transcribe-daemon --asr-provider openai --llm-provider gemini --llm
892
1020
 
893
- # With LLM cleanup
894
- agent-cli transcribe-daemon --llm --role notes
895
1021
 
896
- # Custom log file and audio directory
897
- agent-cli transcribe-daemon --transcription-log ~/meeting.jsonl --audio-dir ~/audio
1022
+ Tips:
898
1023
 
1024
+ • Use --role to tag entries (e.g., speaker1, meeting, personal)
1025
+ • Adjust --vad-threshold if detection is too sensitive (increase) or missing speech
1026
+ (decrease)
1027
+ • Use --stop to cleanly terminate a running daemon
1028
+ • With --llm, transcripts are cleaned up (punctuation, filler words removed)
899
1029
 
900
1030
  ╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
901
- │ --role -r TEXT Role name for logging (e.g.,
902
- 'meeting', 'notes', 'user').
1031
+ │ --role -r TEXT Label for log entries. Use to
1032
+ distinguish speakers or contexts in
1033
+ │ logs. │
903
1034
  │ [default: user] │
904
- │ --silence-threshold -s FLOAT Seconds of silence to end a speech
905
- │ segment.
1035
+ │ --silence-threshold -s FLOAT Seconds of silence after speech to
1036
+ finalize a segment. Increase for
1037
+ │ slower speakers. │
906
1038
  │ [default: 1.0] │
907
- │ --min-segment -m FLOAT Minimum speech duration in seconds
908
- to trigger a segment.
1039
+ │ --min-segment -m FLOAT Minimum seconds of speech required
1040
+ before a segment is processed.
1041
+ │ Filters brief sounds. │
909
1042
  │ [default: 0.25] │
910
- │ --vad-threshold FLOAT VAD speech detection threshold
911
- │ (0.0-1.0). Higher = more aggressive
912
- filtering.
1043
+ │ --vad-threshold FLOAT Silero VAD confidence threshold
1044
+ │ (0.0-1.0). Higher values require
1045
+ clearer speech; lower values are
1046
+ │ more sensitive to quiet/distant │
1047
+ │ voices. │
913
1048
  │ [default: 0.3] │
914
- │ --save-audio --no-save-audio Save audio segments as MP3 files.
1049
+ │ --save-audio --no-save-audio Save each speech segment as MP3.
1050
+ │ Requires ffmpeg to be installed. │
915
1051
  │ [default: save-audio] │
916
- │ --audio-dir PATH Directory for MP3 files. Default:
917
- ~/.config/agent-cli/audio
918
- --transcription-log -t PATH JSON Lines log file path. Default:
1052
+ │ --audio-dir PATH Base directory for MP3 files. Files
1053
+ are organized by date:
1054
+ YYYY/MM/DD/HHMMSS_mmm.mp3. Default:
1055
+ │ ~/.config/agent-cli/audio. │
1056
+ │ --transcription-log -t PATH JSONL file for transcript logging │
1057
+ │ (one JSON object per line with │
1058
+ │ timestamp, role, raw/processed │
1059
+ │ text, audio path). Default: │
919
1060
  │ ~/.config/agent-cli/transcriptions… │
920
- │ --clipboard --no-clipboard Copy each transcription to
921
- │ clipboard.
1061
+ │ --clipboard --no-clipboard Copy each completed transcription
1062
+ to clipboard (overwrites previous).
1063
+ │ Useful with --llm to get cleaned │
1064
+ │ text. │
922
1065
  │ [default: no-clipboard] │
923
1066
  │ --help -h Show this message and exit. │
924
1067
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
925
1068
  ╭─ Provider Selection ───────────────────────────────────────────────────────────────────╮
926
1069
  │ --asr-provider TEXT The ASR provider to use ('wyoming', 'openai', 'gemini'). │
1070
+ │ [env var: ASR_PROVIDER] │
927
1071
  │ [default: wyoming] │
928
1072
  │ --llm-provider TEXT The LLM provider to use ('ollama', 'openai', 'gemini'). │
1073
+ │ [env var: LLM_PROVIDER] │
929
1074
  │ [default: ollama] │
930
1075
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
931
1076
  ╭─ Audio Input ──────────────────────────────────────────────────────────────────────────╮
932
- │ --input-device-index INTEGER Index of the audio input device to use.
933
- --input-device-name TEXT Device name keywords for partial matching.
934
- │ --list-devices List available audio input and output devices and
935
- exit.
1077
+ │ --input-device-index INTEGER Audio input device index (see --list-devices).
1078
+ Uses system default if omitted.
1079
+ │ --input-device-name TEXT Select input device by name substring (e.g.,
1080
+ MacBook or USB).
1081
+ │ --list-devices List available audio devices with their indices │
1082
+ │ and exit. │
936
1083
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
937
1084
  ╭─ Audio Input: Wyoming ─────────────────────────────────────────────────────────────────╮
938
1085
  │ --asr-wyoming-ip TEXT Wyoming ASR server IP address. │
1086
+ │ [env var: ASR_WYOMING_IP] │
939
1087
  │ [default: localhost] │
940
1088
  │ --asr-wyoming-port INTEGER Wyoming ASR server port. │
1089
+ │ [env var: ASR_WYOMING_PORT] │
941
1090
  │ [default: 10300] │
942
1091
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
943
1092
  ╭─ Audio Input: OpenAI-compatible ───────────────────────────────────────────────────────╮
944
1093
  │ --asr-openai-model TEXT The OpenAI model to use for ASR (transcription). │
1094
+ │ [env var: ASR_OPENAI_MODEL] │
945
1095
  │ [default: whisper-1] │
946
1096
  │ --asr-openai-base-url TEXT Custom base URL for OpenAI-compatible ASR API │
947
1097
  │ (e.g., for custom Whisper server: │
948
1098
  │ http://localhost:9898). │
1099
+ │ [env var: ASR_OPENAI_BASE_URL] │
949
1100
  │ --asr-openai-prompt TEXT Custom prompt to guide transcription (optional). │
1101
+ │ [env var: ASR_OPENAI_PROMPT] │
950
1102
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
951
1103
  ╭─ Audio Input: Gemini ──────────────────────────────────────────────────────────────────╮
952
1104
  │ --asr-gemini-model TEXT The Gemini model to use for ASR (transcription). │
1105
+ │ [env var: ASR_GEMINI_MODEL] │
953
1106
  │ [default: gemini-3-flash-preview] │
954
1107
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
955
1108
  ╭─ LLM: Ollama ──────────────────────────────────────────────────────────────────────────╮
956
1109
  │ --llm-ollama-model TEXT The Ollama model to use. Default is gemma3:4b. │
1110
+ │ [env var: LLM_OLLAMA_MODEL] │
957
1111
  │ [default: gemma3:4b] │
958
1112
  │ --llm-ollama-host TEXT The Ollama server host. Default is │
959
1113
  │ http://localhost:11434. │
1114
+ │ [env var: LLM_OLLAMA_HOST] │
960
1115
  │ [default: http://localhost:11434] │
961
1116
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
962
1117
  ╭─ LLM: OpenAI-compatible ───────────────────────────────────────────────────────────────╮
963
1118
  │ --llm-openai-model TEXT The OpenAI model to use for LLM tasks. │
1119
+ │ [env var: LLM_OPENAI_MODEL] │
964
1120
  │ [default: gpt-5-mini] │
965
1121
  │ --openai-api-key TEXT Your OpenAI API key. Can also be set with the │
966
1122
  │ OPENAI_API_KEY environment variable. │
@@ -971,27 +1127,32 @@ uv tool install "agent-cli[vad]" -p 3.13
971
1127
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
972
1128
  ╭─ LLM: Gemini ──────────────────────────────────────────────────────────────────────────╮
973
1129
  │ --llm-gemini-model TEXT The Gemini model to use for LLM tasks. │
1130
+ │ [env var: LLM_GEMINI_MODEL] │
974
1131
  │ [default: gemini-3-flash-preview] │
975
1132
  │ --gemini-api-key TEXT Your Gemini API key. Can also be set with the │
976
1133
  │ GEMINI_API_KEY environment variable. │
977
1134
  │ [env var: GEMINI_API_KEY] │
978
1135
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
979
1136
  ╭─ LLM Configuration ────────────────────────────────────────────────────────────────────╮
980
- │ --llm --no-llm Use an LLM to process the transcript.
1137
+ │ --llm --no-llm Clean up transcript with LLM: fix errors, add punctuation,
1138
+ │ remove filler words. Uses --extra-instructions if set (via CLI │
1139
+ │ or config file). │
981
1140
  │ [default: no-llm] │
982
1141
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
983
1142
  ╭─ Process Management ───────────────────────────────────────────────────────────────────╮
984
- │ --stop Stop any running background process.
985
- │ --status Check if a background process is running.
1143
+ │ --stop Stop any running instance of this command.
1144
+ │ --status Check if an instance is currently running.
986
1145
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
987
1146
  ╭─ General Options ──────────────────────────────────────────────────────────────────────╮
988
- │ --log-level TEXT Set logging level.
989
- [default: WARNING]
990
- --log-file TEXT Path to a file to write logs to.
991
- │ --quiet -q Suppress console output from rich.
992
- │ --config TEXT Path to a TOML configuration file.
993
- │ --print-args Print the command line arguments, including variables
994
- taken from the configuration file.
1147
+ │ --log-level [debug|info|warning|error] Set logging level.
1148
+ [env var: LOG_LEVEL]
1149
+ [default: warning]
1150
+ │ --log-file TEXT Path to a file to write logs to.
1151
+ │ --quiet -q Suppress console output from rich.
1152
+ │ --config TEXT Path to a TOML configuration file.
1153
+ --print-args Print the command line arguments,
1154
+ │ including variables taken from the │
1155
+ │ configuration file. │
995
1156
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
996
1157
 
997
1158
  ```
@@ -1034,10 +1195,25 @@ uv tool install "agent-cli[vad]" -p 3.13
1034
1195
 
1035
1196
  Usage: agent-cli speak [OPTIONS] [TEXT]
1036
1197
 
1037
- Convert text to speech using Wyoming or OpenAI-compatible TTS server.
1198
+ Convert text to speech and play audio through speakers.
1199
+
1200
+ By default, synthesized audio plays immediately. Use --save-file to save to a WAV file
1201
+ instead (skips playback).
1202
+
1203
+ Text can be provided as an argument or read from clipboard automatically.
1204
+
1205
+ Examples:
1206
+
1207
+ Speak text directly: agent-cli speak "Hello, world!"
1208
+
1209
+ Speak clipboard contents: agent-cli speak
1210
+
1211
+ Save to file instead of playing: agent-cli speak "Hello" --save-file greeting.wav
1212
+
1213
+ Use OpenAI-compatible TTS: agent-cli speak "Hello" --tts-provider openai
1038
1214
 
1039
1215
  ╭─ General Options ──────────────────────────────────────────────────────────────────────╮
1040
- │ text [TEXT] Text to speak. Reads from clipboard if not provided.
1216
+ │ text [TEXT] Text to synthesize. If not provided, reads from clipboard.
1041
1217
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1042
1218
  ╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
1043
1219
  │ --help -h Show this message and exit. │
@@ -1045,12 +1221,14 @@ uv tool install "agent-cli[vad]" -p 3.13
1045
1221
  ╭─ Provider Selection ───────────────────────────────────────────────────────────────────╮
1046
1222
  │ --tts-provider TEXT The TTS provider to use ('wyoming', 'openai', 'kokoro', │
1047
1223
  │ 'gemini'). │
1224
+ │ [env var: TTS_PROVIDER] │
1048
1225
  │ [default: wyoming] │
1049
1226
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1050
1227
  ╭─ Audio Output ─────────────────────────────────────────────────────────────────────────╮
1051
- │ --output-device-index INTEGER Index of the audio output device to use for TTS.
1052
- --output-device-name TEXT Output device name keywords for partial
1053
- matching.
1228
+ │ --output-device-index INTEGER Audio output device index (see --list-devices
1229
+ for available devices).
1230
+ --output-device-name TEXT Partial match on device name (e.g., 'speakers',
1231
+ │ 'headphones'). │
1054
1232
  │ --tts-speed FLOAT Speech speed multiplier (1.0 = normal, 2.0 = │
1055
1233
  │ twice as fast, 0.5 = half speed). │
1056
1234
  │ [default: 1.0] │
@@ -1068,7 +1246,8 @@ uv tool install "agent-cli[vad]" -p 3.13
1068
1246
  ╭─ Audio Output: OpenAI-compatible ──────────────────────────────────────────────────────╮
1069
1247
  │ --tts-openai-model TEXT The OpenAI model to use for TTS. │
1070
1248
  │ [default: tts-1] │
1071
- │ --tts-openai-voice TEXT The voice to use for OpenAI-compatible TTS.
1249
+ │ --tts-openai-voice TEXT Voice for OpenAI TTS (alloy, echo, fable, onyx,
1250
+ │ nova, shimmer). │
1072
1251
  │ [default: alloy] │
1073
1252
  │ --tts-openai-base-url TEXT Custom base URL for OpenAI-compatible TTS API │
1074
1253
  │ (e.g., http://localhost:8000/v1 for a proxy). │
@@ -1094,25 +1273,27 @@ uv tool install "agent-cli[vad]" -p 3.13
1094
1273
  │ [env var: GEMINI_API_KEY] │
1095
1274
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1096
1275
  ╭─ Audio Input ──────────────────────────────────────────────────────────────────────────╮
1097
- │ --list-devices List available audio input and output devices and exit.
1276
+ │ --list-devices List available audio devices with their indices and exit.
1098
1277
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1099
1278
  ╭─ General Options ──────────────────────────────────────────────────────────────────────╮
1100
- │ --save-file PATH Save TTS response audio to WAV file.
1101
- --log-level TEXT Set logging level.
1102
- [default: WARNING]
1103
- --log-file TEXT Path to a file to write logs to.
1104
- --quiet -q Suppress console output from rich.
1105
- │ --json Output result as JSON for automation. Implies --quiet and
1106
- --no-clipboard.
1107
- │ --config TEXT Path to a TOML configuration file.
1108
- │ --print-args Print the command line arguments, including variables
1109
- taken from the configuration file.
1279
+ │ --save-file PATH Save audio to WAV file instead of
1280
+ playing through speakers.
1281
+ --log-level [debug|info|warning|error] Set logging level.
1282
+ [env var: LOG_LEVEL]
1283
+ [default: warning]
1284
+ │ --log-file TEXT Path to a file to write logs to.
1285
+ --quiet -q Suppress console output from rich.
1286
+ │ --json Output result as JSON (implies
1287
+ --quiet and --no-clipboard).
1288
+ --config TEXT Path to a TOML configuration file.
1289
+ │ --print-args Print the command line arguments, │
1290
+ │ including variables taken from the │
1291
+ │ configuration file. │
1110
1292
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1111
1293
  ╭─ Process Management ───────────────────────────────────────────────────────────────────╮
1112
- │ --stop Stop any running background process.
1113
- │ --status Check if a background process is running.
1114
- │ --toggle Toggle the background process on/off. If the process is running, it
1115
- │ will be stopped. If the process is not running, it will be started. │
1294
+ │ --stop Stop any running instance of this command.
1295
+ │ --status Check if an instance is currently running.
1296
+ │ --toggle Start if not running, stop if running. Ideal for hotkey binding.
1116
1297
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1117
1298
 
1118
1299
  ```
@@ -1154,58 +1335,77 @@ uv tool install "agent-cli[vad]" -p 3.13
1154
1335
 
1155
1336
  Usage: agent-cli voice-edit [OPTIONS]
1156
1337
 
1157
- Interact with clipboard text via a voice command using local or remote services.
1338
+ Edit or query clipboard text using voice commands.
1158
1339
 
1159
- Usage:
1340
+ Workflow: Captures clipboard text → records your voice command → transcribes it → sends
1341
+ both to an LLM → copies result back to clipboard.
1160
1342
 
1161
- Run in foreground: agent-cli voice-edit --input-device-index 1
1162
- Run in background: agent-cli voice-edit --input-device-index 1 &
1163
- • Check status: agent-cli voice-edit --status
1164
- Stop background process: agent-cli voice-edit --stop
1165
- List output devices: agent-cli voice-edit --list-output-devices
1166
- • Save TTS to file: agent-cli voice-edit --tts --save-file response.wav
1343
+ Use this for hands-free text editing (e.g., "make this more formal") or asking questions
1344
+ about clipboard content (e.g., "summarize this").
1345
+
1346
+ Typical hotkey integration: Run voice-edit & on keypress to start recording, then send
1347
+ SIGINT (via --stop) on second keypress to process.
1348
+
1349
+ Examples:
1350
+
1351
+ • Basic usage: agent-cli voice-edit
1352
+ • With TTS response: agent-cli voice-edit --tts
1353
+ • Toggle on/off: agent-cli voice-edit --toggle
1354
+ • List audio devices: agent-cli voice-edit --list-devices
1167
1355
 
1168
1356
  ╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
1169
1357
  │ --help -h Show this message and exit. │
1170
1358
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1171
1359
  ╭─ Provider Selection ───────────────────────────────────────────────────────────────────╮
1172
1360
  │ --asr-provider TEXT The ASR provider to use ('wyoming', 'openai', 'gemini'). │
1361
+ │ [env var: ASR_PROVIDER] │
1173
1362
  │ [default: wyoming] │
1174
1363
  │ --llm-provider TEXT The LLM provider to use ('ollama', 'openai', 'gemini'). │
1364
+ │ [env var: LLM_PROVIDER] │
1175
1365
  │ [default: ollama] │
1176
1366
  │ --tts-provider TEXT The TTS provider to use ('wyoming', 'openai', 'kokoro', │
1177
1367
  │ 'gemini'). │
1368
+ │ [env var: TTS_PROVIDER] │
1178
1369
  │ [default: wyoming] │
1179
1370
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1180
1371
  ╭─ Audio Input ──────────────────────────────────────────────────────────────────────────╮
1181
- │ --input-device-index INTEGER Index of the audio input device to use.
1182
- --input-device-name TEXT Device name keywords for partial matching.
1183
- │ --list-devices List available audio input and output devices and
1184
- exit.
1372
+ │ --input-device-index INTEGER Audio input device index (see --list-devices).
1373
+ Uses system default if omitted.
1374
+ │ --input-device-name TEXT Select input device by name substring (e.g.,
1375
+ MacBook or USB).
1376
+ │ --list-devices List available audio devices with their indices │
1377
+ │ and exit. │
1185
1378
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1186
1379
  ╭─ Audio Input: Wyoming ─────────────────────────────────────────────────────────────────╮
1187
1380
  │ --asr-wyoming-ip TEXT Wyoming ASR server IP address. │
1381
+ │ [env var: ASR_WYOMING_IP] │
1188
1382
  │ [default: localhost] │
1189
1383
  │ --asr-wyoming-port INTEGER Wyoming ASR server port. │
1384
+ │ [env var: ASR_WYOMING_PORT] │
1190
1385
  │ [default: 10300] │
1191
1386
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1192
1387
  ╭─ Audio Input: OpenAI-compatible ───────────────────────────────────────────────────────╮
1193
1388
  │ --asr-openai-model TEXT The OpenAI model to use for ASR (transcription). │
1389
+ │ [env var: ASR_OPENAI_MODEL] │
1194
1390
  │ [default: whisper-1] │
1195
1391
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1196
1392
  ╭─ Audio Input: Gemini ──────────────────────────────────────────────────────────────────╮
1197
1393
  │ --asr-gemini-model TEXT The Gemini model to use for ASR (transcription). │
1394
+ │ [env var: ASR_GEMINI_MODEL] │
1198
1395
  │ [default: gemini-3-flash-preview] │
1199
1396
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1200
1397
  ╭─ LLM: Ollama ──────────────────────────────────────────────────────────────────────────╮
1201
1398
  │ --llm-ollama-model TEXT The Ollama model to use. Default is gemma3:4b. │
1399
+ │ [env var: LLM_OLLAMA_MODEL] │
1202
1400
  │ [default: gemma3:4b] │
1203
1401
  │ --llm-ollama-host TEXT The Ollama server host. Default is │
1204
1402
  │ http://localhost:11434. │
1403
+ │ [env var: LLM_OLLAMA_HOST] │
1205
1404
  │ [default: http://localhost:11434] │
1206
1405
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1207
1406
  ╭─ LLM: OpenAI-compatible ───────────────────────────────────────────────────────────────╮
1208
1407
  │ --llm-openai-model TEXT The OpenAI model to use for LLM tasks. │
1408
+ │ [env var: LLM_OPENAI_MODEL] │
1209
1409
  │ [default: gpt-5-mini] │
1210
1410
  │ --openai-api-key TEXT Your OpenAI API key. Can also be set with the │
1211
1411
  │ OPENAI_API_KEY environment variable. │
@@ -1216,6 +1416,7 @@ uv tool install "agent-cli[vad]" -p 3.13
1216
1416
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1217
1417
  ╭─ LLM: Gemini ──────────────────────────────────────────────────────────────────────────╮
1218
1418
  │ --llm-gemini-model TEXT The Gemini model to use for LLM tasks. │
1419
+ │ [env var: LLM_GEMINI_MODEL] │
1219
1420
  │ [default: gemini-3-flash-preview] │
1220
1421
  │ --gemini-api-key TEXT Your Gemini API key. Can also be set with the │
1221
1422
  │ GEMINI_API_KEY environment variable. │
@@ -1224,10 +1425,10 @@ uv tool install "agent-cli[vad]" -p 3.13
1224
1425
  ╭─ Audio Output ─────────────────────────────────────────────────────────────────────────╮
1225
1426
  │ --tts --no-tts Enable text-to-speech for responses. │
1226
1427
  │ [default: no-tts] │
1227
- │ --output-device-index INTEGER Index of the audio output device to use
1228
- │ for TTS.
1229
- │ --output-device-name TEXT Output device name keywords for partial
1230
- matching.
1428
+ │ --output-device-index INTEGER Audio output device index (see
1429
+ --list-devices for available devices).
1430
+ │ --output-device-name TEXT Partial match on device name (e.g.,
1431
+ 'speakers', 'headphones').
1231
1432
  │ --tts-speed FLOAT Speech speed multiplier (1.0 = normal, │
1232
1433
  │ 2.0 = twice as fast, 0.5 = half speed). │
1233
1434
  │ [default: 1.0] │
@@ -1245,7 +1446,8 @@ uv tool install "agent-cli[vad]" -p 3.13
1245
1446
  ╭─ Audio Output: OpenAI-compatible ──────────────────────────────────────────────────────╮
1246
1447
  │ --tts-openai-model TEXT The OpenAI model to use for TTS. │
1247
1448
  │ [default: tts-1] │
1248
- │ --tts-openai-voice TEXT The voice to use for OpenAI-compatible TTS.
1449
+ │ --tts-openai-voice TEXT Voice for OpenAI TTS (alloy, echo, fable, onyx,
1450
+ │ nova, shimmer). │
1249
1451
  │ [default: alloy] │
1250
1452
  │ --tts-openai-base-url TEXT Custom base URL for OpenAI-compatible TTS API │
1251
1453
  │ (e.g., http://localhost:8000/v1 for a proxy). │
@@ -1266,24 +1468,33 @@ uv tool install "agent-cli[vad]" -p 3.13
1266
1468
  │ [default: Kore] │
1267
1469
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1268
1470
  ╭─ Process Management ───────────────────────────────────────────────────────────────────╮
1269
- │ --stop Stop any running background process.
1270
- │ --status Check if a background process is running.
1271
- │ --toggle Toggle the background process on/off. If the process is running, it
1272
- │ will be stopped. If the process is not running, it will be started. │
1471
+ │ --stop Stop any running instance of this command.
1472
+ │ --status Check if an instance is currently running.
1473
+ │ --toggle Start if not running, stop if running. Ideal for hotkey binding.
1273
1474
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1274
1475
  ╭─ General Options ──────────────────────────────────────────────────────────────────────╮
1275
- │ --save-file PATH Save TTS response audio to WAV file.
1276
- --clipboard --no-clipboard Copy result to clipboard.
1277
- [default: clipboard]
1278
- │ --log-level TEXT Set logging level.
1279
- [default: WARNING]
1280
- --log-file TEXT Path to a file to write logs to.
1281
- │ --quiet -q Suppress console output from rich.
1282
- --json Output result as JSON for automation.
1283
- Implies --quiet and --no-clipboard.
1284
- │ --config TEXT Path to a TOML configuration file.
1285
- --print-args Print the command line arguments, including
1286
- variables taken from the configuration file.
1476
+ │ --save-file PATH Save audio to WAV file
1477
+ instead of playing
1478
+ through speakers.
1479
+ │ --clipboard --no-clipboard Copy result to
1480
+ clipboard.
1481
+ [default: clipboard]
1482
+ │ --log-level [debug|info|warning|erro Set logging level.
1483
+ r] [env var: LOG_LEVEL]
1484
+ [default: warning]
1485
+ │ --log-file TEXT Path to a file to write
1486
+ logs to.
1487
+ --quiet -q Suppress console output
1488
+ │ from rich. │
1489
+ │ --json Output result as JSON │
1490
+ │ (implies --quiet and │
1491
+ │ --no-clipboard). │
1492
+ │ --config TEXT Path to a TOML │
1493
+ │ configuration file. │
1494
+ │ --print-args Print the command line │
1495
+ │ arguments, including │
1496
+ │ variables taken from the │
1497
+ │ configuration file. │
1287
1498
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1288
1499
 
1289
1500
  ```
@@ -1328,58 +1539,93 @@ uv tool install "agent-cli[vad]" -p 3.13
1328
1539
 
1329
1540
  Usage: agent-cli assistant [OPTIONS]
1330
1541
 
1331
- Wake word-based voice assistant using local or remote services.
1542
+ Hands-free voice assistant using wake word detection.
1543
+
1544
+ Continuously listens for a wake word, then records your speech until you say the wake
1545
+ word again. The recording is transcribed and sent to an LLM for a conversational
1546
+ response, optionally spoken back via TTS.
1547
+
1548
+ Conversation flow:
1549
+
1550
+ 1 Say wake word → starts recording
1551
+ 2 Speak your question/command
1552
+ 3 Say wake word again → stops recording and processes
1553
+
1554
+ The assistant runs in a loop, ready for the next command after each response. Stop with
1555
+ Ctrl+C or --stop.
1556
+
1557
+ Requirements:
1558
+
1559
+ • Wyoming wake word server (e.g., wyoming-openwakeword on port 10400)
1560
+ • Wyoming ASR server (e.g., wyoming-whisper on port 10300)
1561
+ • Optional: TTS server for spoken responses (enable with --tts)
1562
+
1563
+ Example: assistant --wake-word ok_nabu --tts --input-device-name USB
1332
1564
 
1333
1565
  ╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
1334
1566
  │ --help -h Show this message and exit. │
1335
1567
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1336
1568
  ╭─ Provider Selection ───────────────────────────────────────────────────────────────────╮
1337
1569
  │ --asr-provider TEXT The ASR provider to use ('wyoming', 'openai', 'gemini'). │
1570
+ │ [env var: ASR_PROVIDER] │
1338
1571
  │ [default: wyoming] │
1339
1572
  │ --llm-provider TEXT The LLM provider to use ('ollama', 'openai', 'gemini'). │
1573
+ │ [env var: LLM_PROVIDER] │
1340
1574
  │ [default: ollama] │
1341
1575
  │ --tts-provider TEXT The TTS provider to use ('wyoming', 'openai', 'kokoro', │
1342
1576
  │ 'gemini'). │
1577
+ │ [env var: TTS_PROVIDER] │
1343
1578
  │ [default: wyoming] │
1344
1579
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1345
1580
  ╭─ Wake Word ────────────────────────────────────────────────────────────────────────────╮
1346
- │ --wake-server-ip TEXT Wyoming wake word server IP address.
1581
+ │ --wake-server-ip TEXT Wyoming wake word server IP (requires
1582
+ │ wyoming-openwakeword or similar). │
1347
1583
  │ [default: localhost] │
1348
1584
  │ --wake-server-port INTEGER Wyoming wake word server port. │
1349
1585
  │ [default: 10400] │
1350
- │ --wake-word TEXT Name of wake word to detect (e.g., 'ok_nabu', │
1351
- 'hey_jarvis').
1586
+ │ --wake-word TEXT Wake word to detect. Common options: ok_nabu, │
1587
+ │ hey_jarvis, alexa. Must match a model loaded in
1588
+ │ your wake word server. │
1352
1589
  │ [default: ok_nabu] │
1353
1590
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1354
1591
  ╭─ Audio Input ──────────────────────────────────────────────────────────────────────────╮
1355
- │ --input-device-index INTEGER Index of the audio input device to use.
1356
- --input-device-name TEXT Device name keywords for partial matching.
1357
- │ --list-devices List available audio input and output devices and
1358
- exit.
1592
+ │ --input-device-index INTEGER Audio input device index (see --list-devices).
1593
+ Uses system default if omitted.
1594
+ │ --input-device-name TEXT Select input device by name substring (e.g.,
1595
+ MacBook or USB).
1596
+ │ --list-devices List available audio devices with their indices │
1597
+ │ and exit. │
1359
1598
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1360
1599
  ╭─ Audio Input: Wyoming ─────────────────────────────────────────────────────────────────╮
1361
1600
  │ --asr-wyoming-ip TEXT Wyoming ASR server IP address. │
1601
+ │ [env var: ASR_WYOMING_IP] │
1362
1602
  │ [default: localhost] │
1363
1603
  │ --asr-wyoming-port INTEGER Wyoming ASR server port. │
1604
+ │ [env var: ASR_WYOMING_PORT] │
1364
1605
  │ [default: 10300] │
1365
1606
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1366
1607
  ╭─ Audio Input: OpenAI-compatible ───────────────────────────────────────────────────────╮
1367
1608
  │ --asr-openai-model TEXT The OpenAI model to use for ASR (transcription). │
1609
+ │ [env var: ASR_OPENAI_MODEL] │
1368
1610
  │ [default: whisper-1] │
1369
1611
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1370
1612
  ╭─ Audio Input: Gemini ──────────────────────────────────────────────────────────────────╮
1371
1613
  │ --asr-gemini-model TEXT The Gemini model to use for ASR (transcription). │
1614
+ │ [env var: ASR_GEMINI_MODEL] │
1372
1615
  │ [default: gemini-3-flash-preview] │
1373
1616
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1374
1617
  ╭─ LLM: Ollama ──────────────────────────────────────────────────────────────────────────╮
1375
1618
  │ --llm-ollama-model TEXT The Ollama model to use. Default is gemma3:4b. │
1619
+ │ [env var: LLM_OLLAMA_MODEL] │
1376
1620
  │ [default: gemma3:4b] │
1377
1621
  │ --llm-ollama-host TEXT The Ollama server host. Default is │
1378
1622
  │ http://localhost:11434. │
1623
+ │ [env var: LLM_OLLAMA_HOST] │
1379
1624
  │ [default: http://localhost:11434] │
1380
1625
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1381
1626
  ╭─ LLM: OpenAI-compatible ───────────────────────────────────────────────────────────────╮
1382
1627
  │ --llm-openai-model TEXT The OpenAI model to use for LLM tasks. │
1628
+ │ [env var: LLM_OPENAI_MODEL] │
1383
1629
  │ [default: gpt-5-mini] │
1384
1630
  │ --openai-api-key TEXT Your OpenAI API key. Can also be set with the │
1385
1631
  │ OPENAI_API_KEY environment variable. │
@@ -1390,6 +1636,7 @@ uv tool install "agent-cli[vad]" -p 3.13
1390
1636
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1391
1637
  ╭─ LLM: Gemini ──────────────────────────────────────────────────────────────────────────╮
1392
1638
  │ --llm-gemini-model TEXT The Gemini model to use for LLM tasks. │
1639
+ │ [env var: LLM_GEMINI_MODEL] │
1393
1640
  │ [default: gemini-3-flash-preview] │
1394
1641
  │ --gemini-api-key TEXT Your Gemini API key. Can also be set with the │
1395
1642
  │ GEMINI_API_KEY environment variable. │
@@ -1398,10 +1645,10 @@ uv tool install "agent-cli[vad]" -p 3.13
1398
1645
  ╭─ Audio Output ─────────────────────────────────────────────────────────────────────────╮
1399
1646
  │ --tts --no-tts Enable text-to-speech for responses. │
1400
1647
  │ [default: no-tts] │
1401
- │ --output-device-index INTEGER Index of the audio output device to use
1402
- │ for TTS.
1403
- │ --output-device-name TEXT Output device name keywords for partial
1404
- matching.
1648
+ │ --output-device-index INTEGER Audio output device index (see
1649
+ --list-devices for available devices).
1650
+ │ --output-device-name TEXT Partial match on device name (e.g.,
1651
+ 'speakers', 'headphones').
1405
1652
  │ --tts-speed FLOAT Speech speed multiplier (1.0 = normal, │
1406
1653
  │ 2.0 = twice as fast, 0.5 = half speed). │
1407
1654
  │ [default: 1.0] │
@@ -1419,7 +1666,8 @@ uv tool install "agent-cli[vad]" -p 3.13
1419
1666
  ╭─ Audio Output: OpenAI-compatible ──────────────────────────────────────────────────────╮
1420
1667
  │ --tts-openai-model TEXT The OpenAI model to use for TTS. │
1421
1668
  │ [default: tts-1] │
1422
- │ --tts-openai-voice TEXT The voice to use for OpenAI-compatible TTS.
1669
+ │ --tts-openai-voice TEXT Voice for OpenAI TTS (alloy, echo, fable, onyx,
1670
+ │ nova, shimmer). │
1423
1671
  │ [default: alloy] │
1424
1672
  │ --tts-openai-base-url TEXT Custom base URL for OpenAI-compatible TTS API │
1425
1673
  │ (e.g., http://localhost:8000/v1 for a proxy). │
@@ -1440,22 +1688,30 @@ uv tool install "agent-cli[vad]" -p 3.13
1440
1688
  │ [default: Kore] │
1441
1689
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1442
1690
  ╭─ Process Management ───────────────────────────────────────────────────────────────────╮
1443
- │ --stop Stop any running background process.
1444
- │ --status Check if a background process is running.
1445
- │ --toggle Toggle the background process on/off. If the process is running, it
1446
- │ will be stopped. If the process is not running, it will be started. │
1691
+ │ --stop Stop any running instance of this command.
1692
+ │ --status Check if an instance is currently running.
1693
+ │ --toggle Start if not running, stop if running. Ideal for hotkey binding.
1447
1694
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1448
1695
  ╭─ General Options ──────────────────────────────────────────────────────────────────────╮
1449
- │ --save-file PATH Save TTS response audio to WAV file.
1450
- --clipboard --no-clipboard Copy result to clipboard.
1451
- [default: clipboard]
1452
- │ --log-level TEXT Set logging level.
1453
- [default: WARNING]
1454
- --log-file TEXT Path to a file to write logs to.
1455
- │ --quiet -q Suppress console output from rich.
1456
- --config TEXT Path to a TOML configuration file.
1457
- --print-args Print the command line arguments, including
1458
- variables taken from the configuration file.
1696
+ │ --save-file PATH Save audio to WAV file
1697
+ instead of playing
1698
+ through speakers.
1699
+ │ --clipboard --no-clipboard Copy result to
1700
+ clipboard.
1701
+ [default: clipboard]
1702
+ │ --log-level [debug|info|warning|erro Set logging level.
1703
+ r] [env var: LOG_LEVEL]
1704
+ [default: warning]
1705
+ --log-file TEXT Path to a file to write
1706
+ │ logs to. │
1707
+ │ --quiet -q Suppress console output │
1708
+ │ from rich. │
1709
+ │ --config TEXT Path to a TOML │
1710
+ │ configuration file. │
1711
+ │ --print-args Print the command line │
1712
+ │ arguments, including │
1713
+ │ variables taken from the │
1714
+ │ configuration file. │
1459
1715
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1460
1716
 
1461
1717
  ```
@@ -1507,53 +1763,99 @@ uv tool install "agent-cli[vad]" -p 3.13
1507
1763
 
1508
1764
  Usage: agent-cli chat [OPTIONS]
1509
1765
 
1510
- An chat agent that you can talk to.
1766
+ Voice-based conversational chat agent with memory and tools.
1767
+
1768
+ Runs an interactive loop: listen → transcribe → LLM → speak response. Conversation
1769
+ history is persisted and included as context for continuity.
1770
+
1771
+ Built-in tools (LLM uses automatically when relevant):
1772
+
1773
+ • add_memory/search_memory/update_memory - persistent long-term memory
1774
+ • duckduckgo_search - web search for current information
1775
+ • read_file/execute_code - file access and shell commands
1776
+
1777
+ Process management: Use --toggle to start/stop via hotkey (bind to a keyboard shortcut),
1778
+ --stop to terminate, or --status to check state.
1779
+
1780
+ Examples:
1781
+
1782
+ Use OpenAI-compatible providers for speech and LLM, with TTS enabled:
1783
+
1784
+
1785
+ agent-cli chat --asr-provider openai --llm-provider openai --tts
1786
+
1787
+
1788
+ Start in background mode (toggle on/off with hotkey):
1789
+
1790
+
1791
+ agent-cli chat --toggle
1792
+
1793
+
1794
+ Use local Ollama LLM with Wyoming ASR:
1795
+
1796
+
1797
+ agent-cli chat --llm-provider ollama
1798
+
1511
1799
 
1512
1800
  ╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
1513
1801
  │ --help -h Show this message and exit. │
1514
1802
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1515
1803
  ╭─ Provider Selection ───────────────────────────────────────────────────────────────────╮
1516
1804
  │ --asr-provider TEXT The ASR provider to use ('wyoming', 'openai', 'gemini'). │
1805
+ │ [env var: ASR_PROVIDER] │
1517
1806
  │ [default: wyoming] │
1518
1807
  │ --llm-provider TEXT The LLM provider to use ('ollama', 'openai', 'gemini'). │
1808
+ │ [env var: LLM_PROVIDER] │
1519
1809
  │ [default: ollama] │
1520
1810
  │ --tts-provider TEXT The TTS provider to use ('wyoming', 'openai', 'kokoro', │
1521
1811
  │ 'gemini'). │
1812
+ │ [env var: TTS_PROVIDER] │
1522
1813
  │ [default: wyoming] │
1523
1814
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1524
1815
  ╭─ Audio Input ──────────────────────────────────────────────────────────────────────────╮
1525
- │ --input-device-index INTEGER Index of the audio input device to use.
1526
- --input-device-name TEXT Device name keywords for partial matching.
1527
- │ --list-devices List available audio input and output devices and
1528
- exit.
1816
+ │ --input-device-index INTEGER Audio input device index (see --list-devices).
1817
+ Uses system default if omitted.
1818
+ │ --input-device-name TEXT Select input device by name substring (e.g.,
1819
+ MacBook or USB).
1820
+ │ --list-devices List available audio devices with their indices │
1821
+ │ and exit. │
1529
1822
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1530
1823
  ╭─ Audio Input: Wyoming ─────────────────────────────────────────────────────────────────╮
1531
1824
  │ --asr-wyoming-ip TEXT Wyoming ASR server IP address. │
1825
+ │ [env var: ASR_WYOMING_IP] │
1532
1826
  │ [default: localhost] │
1533
1827
  │ --asr-wyoming-port INTEGER Wyoming ASR server port. │
1828
+ │ [env var: ASR_WYOMING_PORT] │
1534
1829
  │ [default: 10300] │
1535
1830
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1536
1831
  ╭─ Audio Input: OpenAI-compatible ───────────────────────────────────────────────────────╮
1537
1832
  │ --asr-openai-model TEXT The OpenAI model to use for ASR (transcription). │
1833
+ │ [env var: ASR_OPENAI_MODEL] │
1538
1834
  │ [default: whisper-1] │
1539
1835
  │ --asr-openai-base-url TEXT Custom base URL for OpenAI-compatible ASR API │
1540
1836
  │ (e.g., for custom Whisper server: │
1541
1837
  │ http://localhost:9898). │
1838
+ │ [env var: ASR_OPENAI_BASE_URL] │
1542
1839
  │ --asr-openai-prompt TEXT Custom prompt to guide transcription (optional). │
1840
+ │ [env var: ASR_OPENAI_PROMPT] │
1543
1841
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1544
1842
  ╭─ Audio Input: Gemini ──────────────────────────────────────────────────────────────────╮
1545
1843
  │ --asr-gemini-model TEXT The Gemini model to use for ASR (transcription). │
1844
+ │ [env var: ASR_GEMINI_MODEL] │
1546
1845
  │ [default: gemini-3-flash-preview] │
1547
1846
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1548
1847
  ╭─ LLM: Ollama ──────────────────────────────────────────────────────────────────────────╮
1549
1848
  │ --llm-ollama-model TEXT The Ollama model to use. Default is gemma3:4b. │
1849
+ │ [env var: LLM_OLLAMA_MODEL] │
1550
1850
  │ [default: gemma3:4b] │
1551
1851
  │ --llm-ollama-host TEXT The Ollama server host. Default is │
1552
1852
  │ http://localhost:11434. │
1853
+ │ [env var: LLM_OLLAMA_HOST] │
1553
1854
  │ [default: http://localhost:11434] │
1554
1855
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1555
1856
  ╭─ LLM: OpenAI-compatible ───────────────────────────────────────────────────────────────╮
1556
1857
  │ --llm-openai-model TEXT The OpenAI model to use for LLM tasks. │
1858
+ │ [env var: LLM_OPENAI_MODEL] │
1557
1859
  │ [default: gpt-5-mini] │
1558
1860
  │ --openai-api-key TEXT Your OpenAI API key. Can also be set with the │
1559
1861
  │ OPENAI_API_KEY environment variable. │
@@ -1564,6 +1866,7 @@ uv tool install "agent-cli[vad]" -p 3.13
1564
1866
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1565
1867
  ╭─ LLM: Gemini ──────────────────────────────────────────────────────────────────────────╮
1566
1868
  │ --llm-gemini-model TEXT The Gemini model to use for LLM tasks. │
1869
+ │ [env var: LLM_GEMINI_MODEL] │
1567
1870
  │ [default: gemini-3-flash-preview] │
1568
1871
  │ --gemini-api-key TEXT Your Gemini API key. Can also be set with the │
1569
1872
  │ GEMINI_API_KEY environment variable. │
@@ -1572,10 +1875,10 @@ uv tool install "agent-cli[vad]" -p 3.13
1572
1875
  ╭─ Audio Output ─────────────────────────────────────────────────────────────────────────╮
1573
1876
  │ --tts --no-tts Enable text-to-speech for responses. │
1574
1877
  │ [default: no-tts] │
1575
- │ --output-device-index INTEGER Index of the audio output device to use
1576
- │ for TTS.
1577
- │ --output-device-name TEXT Output device name keywords for partial
1578
- matching.
1878
+ │ --output-device-index INTEGER Audio output device index (see
1879
+ --list-devices for available devices).
1880
+ │ --output-device-name TEXT Partial match on device name (e.g.,
1881
+ 'speakers', 'headphones').
1579
1882
  │ --tts-speed FLOAT Speech speed multiplier (1.0 = normal, │
1580
1883
  │ 2.0 = twice as fast, 0.5 = half speed). │
1581
1884
  │ [default: 1.0] │
@@ -1593,7 +1896,8 @@ uv tool install "agent-cli[vad]" -p 3.13
1593
1896
  ╭─ Audio Output: OpenAI-compatible ──────────────────────────────────────────────────────╮
1594
1897
  │ --tts-openai-model TEXT The OpenAI model to use for TTS. │
1595
1898
  │ [default: tts-1] │
1596
- │ --tts-openai-voice TEXT The voice to use for OpenAI-compatible TTS.
1899
+ │ --tts-openai-voice TEXT Voice for OpenAI TTS (alloy, echo, fable, onyx,
1900
+ │ nova, shimmer). │
1597
1901
  │ [default: alloy] │
1598
1902
  │ --tts-openai-base-url TEXT Custom base URL for OpenAI-compatible TTS API │
1599
1903
  │ (e.g., http://localhost:8000/v1 for a proxy). │
@@ -1614,27 +1918,32 @@ uv tool install "agent-cli[vad]" -p 3.13
1614
1918
  │ [default: Kore] │
1615
1919
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1616
1920
  ╭─ Process Management ───────────────────────────────────────────────────────────────────╮
1617
- │ --stop Stop any running background process.
1618
- │ --status Check if a background process is running.
1619
- │ --toggle Toggle the background process on/off. If the process is running, it
1620
- │ will be stopped. If the process is not running, it will be started. │
1921
+ │ --stop Stop any running instance of this command.
1922
+ │ --status Check if an instance is currently running.
1923
+ │ --toggle Start if not running, stop if running. Ideal for hotkey binding.
1621
1924
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1622
1925
  ╭─ History Options ──────────────────────────────────────────────────────────────────────╮
1623
- │ --history-dir PATH Directory to store conversation history.
1926
+ │ --history-dir PATH Directory for conversation history and long-term
1927
+ │ memory. Both conversation.json and │
1928
+ │ long_term_memory.json are stored here. │
1624
1929
  │ [default: ~/.config/agent-cli/history] │
1625
- │ --last-n-messages INTEGER Number of messages to include in the conversation
1626
- history. Set to 0 to disable history.
1930
+ │ --last-n-messages INTEGER Number of past messages to include as context for
1931
+ the LLM. Set to 0 to start fresh each session
1932
+ │ (memory tools still persist). │
1627
1933
  │ [default: 50] │
1628
1934
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1629
1935
  ╭─ General Options ──────────────────────────────────────────────────────────────────────╮
1630
- │ --save-file PATH Save TTS response audio to WAV file.
1631
- --log-level TEXT Set logging level.
1632
- [default: WARNING]
1633
- --log-file TEXT Path to a file to write logs to.
1634
- --quiet -q Suppress console output from rich.
1635
- │ --config TEXT Path to a TOML configuration file.
1636
- │ --print-args Print the command line arguments, including variables
1637
- taken from the configuration file.
1936
+ │ --save-file PATH Save audio to WAV file instead of
1937
+ playing through speakers.
1938
+ --log-level [debug|info|warning|error] Set logging level.
1939
+ [env var: LOG_LEVEL]
1940
+ [default: warning]
1941
+ │ --log-file TEXT Path to a file to write logs to.
1942
+ │ --quiet -q Suppress console output from rich.
1943
+ --config TEXT Path to a TOML configuration file.
1944
+ │ --print-args Print the command line arguments, │
1945
+ │ including variables taken from the │
1946
+ │ configuration file. │
1638
1947
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1639
1948
 
1640
1949
  ```
@@ -1680,25 +1989,68 @@ uv tool install "agent-cli[vad]" -p 3.13
1680
1989
 
1681
1990
  Usage: agent-cli rag-proxy [OPTIONS]
1682
1991
 
1683
- Start the RAG (Retrieval-Augmented Generation) Proxy Server.
1992
+ Start a RAG proxy server that enables "chat with your documents".
1993
+
1994
+ Watches a folder for documents, indexes them into a vector store, and provides an
1995
+ OpenAI-compatible API at /v1/chat/completions. When you send a chat request, the server
1996
+ retrieves relevant document chunks and injects them as context before forwarding to your
1997
+ LLM backend.
1998
+
1999
+ Quick start:
2000
+
2001
+ • agent-cli rag-proxy — Start with defaults (./rag_docs, OpenAI-compatible API)
2002
+ • agent-cli rag-proxy --docs-folder ~/notes — Index your notes folder
2003
+
2004
+ How it works:
1684
2005
 
1685
- This server watches a folder for documents, indexes them, and provides an
1686
- OpenAI-compatible API that proxies requests to a backend LLM (like llama.cpp), injecting
1687
- relevant context from the documents.
2006
+ 1 Documents in --docs-folder are chunked, embedded, and stored in ChromaDB
2007
+ 2 A file watcher auto-reindexes when files change
2008
+ 3 Chat requests trigger a semantic search for relevant chunks
2009
+ 4 Retrieved context is injected into the prompt before forwarding to the LLM
2010
+ 5 Responses include a rag_sources field listing which documents were used
2011
+
2012
+ Supported file formats:
2013
+
2014
+ Text: .txt, .md, .json, .py, .js, .ts, .yaml, .toml, .rst, etc. Rich documents (via
2015
+ MarkItDown): .pdf, .docx, .pptx, .xlsx, .html, .csv
2016
+
2017
+ API endpoints:
2018
+
2019
+ • POST /v1/chat/completions — Main chat endpoint (OpenAI-compatible)
2020
+ • GET /health — Health check with configuration info
2021
+ • GET /files — List indexed files with chunk counts
2022
+ • POST /reindex — Trigger manual reindex
2023
+ • All other paths are proxied to the LLM backend
2024
+
2025
+ Per-request overrides (in JSON body):
2026
+
2027
+ • rag_top_k: Override --limit for this request
2028
+ • rag_enable_tools: Override --rag-tools for this request
1688
2029
 
1689
2030
  ╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
1690
2031
  │ --help -h Show this message and exit. │
1691
2032
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1692
2033
  ╭─ RAG Configuration ────────────────────────────────────────────────────────────────────╮
1693
- │ --docs-folder PATH Folder to watch for documents
2034
+ │ --docs-folder PATH Folder to watch for documents. Files are
2035
+ │ auto-indexed on startup and when changed. │
2036
+ │ Must not overlap with --chroma-path. │
1694
2037
  │ [default: ./rag_docs] │
1695
- │ --chroma-path PATH Path to ChromaDB persistence directory
2038
+ │ --chroma-path PATH ChromaDB storage directory for vector
2039
+ │ embeddings. Must be separate from │
2040
+ │ --docs-folder to avoid indexing database │
2041
+ │ files. │
1696
2042
  │ [default: ./rag_db] │
1697
2043
  │ --limit INTEGER Number of document chunks to retrieve per │
1698
- │ query.
2044
+ │ query. Higher values provide more context
2045
+ │ but use more tokens. Can be overridden │
2046
+ │ per-request via rag_top_k in the JSON │
2047
+ │ body. │
1699
2048
  │ [default: 3] │
1700
- │ --rag-tools --no-rag-tools Allow agent to fetch full documents when
1701
- snippets are insufficient.
2049
+ │ --rag-tools --no-rag-tools Enable read_full_document() tool so the
2050
+ LLM can request full document content when
2051
+ │ retrieved snippets are insufficient. Can │
2052
+ │ be overridden per-request via │
2053
+ │ rag_enable_tools in the JSON body. │
1702
2054
  │ [default: rag-tools] │
1703
2055
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1704
2056
  ╭─ LLM: OpenAI-compatible ───────────────────────────────────────────────────────────────╮
@@ -1716,15 +2068,18 @@ uv tool install "agent-cli[vad]" -p 3.13
1716
2068
  ╭─ Server Configuration ─────────────────────────────────────────────────────────────────╮
1717
2069
  │ --host TEXT Host/IP to bind API servers to. │
1718
2070
  │ [default: 0.0.0.0] │
1719
- │ --port INTEGER Port to bind to
2071
+ │ --port INTEGER Port for the RAG proxy API (e.g.,
2072
+ │ http://localhost:8000/v1/chat/completions). │
1720
2073
  │ [default: 8000] │
1721
2074
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1722
2075
  ╭─ General Options ──────────────────────────────────────────────────────────────────────╮
1723
- │ --log-level TEXT Set logging level.
1724
- [default: INFO]
1725
- --config TEXT Path to a TOML configuration file.
1726
- │ --print-args Print the command line arguments, including variables taken
1727
- from the configuration file.
2076
+ │ --log-level [debug|info|warning|error] Set logging level.
2077
+ [env var: LOG_LEVEL]
2078
+ [default: info]
2079
+ │ --config TEXT Path to a TOML configuration file.
2080
+ --print-args Print the command line arguments,
2081
+ │ including variables taken from the │
2082
+ │ configuration file. │
1728
2083
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1729
2084
 
1730
2085
  ```
@@ -1804,41 +2159,61 @@ The `memory proxy` command is the core feature—a middleware server that gives
1804
2159
  5 Extracts new facts from the conversation in the background and updates the long-term
1805
2160
  memory store (including handling contradictions).
1806
2161
 
1807
- Use this to give "long-term memory" to any OpenAI-compatible application. Point your
1808
- client's base URL to http://localhost:8100/v1.
2162
+ Example:
2163
+
2164
+
2165
+ # Start proxy pointing to local Ollama
2166
+ agent-cli memory proxy --openai-base-url http://localhost:11434/v1
2167
+
2168
+ # Then configure your chat client to use http://localhost:8100/v1
2169
+ # as its OpenAI base URL. All requests flow through the memory proxy.
2170
+
2171
+
2172
+ Per-request overrides: Clients can include these fields in the request body: memory_id
2173
+ (conversation ID), memory_top_k, memory_recency_weight, memory_score_threshold.
1809
2174
 
1810
2175
  ╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
1811
2176
  │ --help -h Show this message and exit. │
1812
2177
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1813
2178
  ╭─ Memory Configuration ─────────────────────────────────────────────────────────────────╮
1814
- │ --memory-path PATH Path to the memory store (files +
1815
- derived vector index).
2179
+ │ --memory-path PATH Directory for memory storage.
2180
+ Contains entries/ (Markdown
2181
+ │ files) and chroma/ (vector │
2182
+ │ index). Created automatically if │
2183
+ │ it doesn't exist. │
1816
2184
  │ [default: ./memory_db] │
1817
- │ --default-top-k INTEGER Number of memory entries to
1818
- retrieve per query.
2185
+ │ --default-top-k INTEGER Number of relevant memories to
2186
+ inject into each request. Higher
2187
+ │ values provide more context but │
2188
+ │ increase token usage. │
1819
2189
  │ [default: 5] │
1820
- │ --max-entries INTEGER Maximum stored memory entries per │
1821
- conversation (excluding summary).
2190
+ │ --max-entries INTEGER Maximum entries per conversation
2191
+ before oldest are evicted.
2192
+ │ Summaries are preserved │
2193
+ │ separately. │
1822
2194
  │ [default: 500] │
1823
2195
  │ --mmr-lambda FLOAT MMR lambda (0-1): higher favors │
1824
2196
  │ relevance, lower favors │
1825
2197
  │ diversity. │
1826
2198
  │ [default: 0.7] │
1827
- │ --recency-weight FLOAT Recency score weight (0.0-1.0).
1828
- Controls freshness vs. relevance. │
1829
- Default 0.2 (20% recency, 80%
1830
- │ semantic relevance). │
2199
+ │ --recency-weight FLOAT Weight for recency vs semantic
2200
+ relevance (0.0-1.0). At 0.2: 20%
2201
+ │ recency, 80% semantic similarity.
1831
2202
  │ [default: 0.2] │
1832
2203
  │ --score-threshold FLOAT Minimum semantic relevance │
1833
2204
  │ threshold (0.0-1.0). Memories │
1834
2205
  │ below this score are discarded to │
1835
2206
  │ reduce noise. │
1836
2207
  │ [default: 0.35] │
1837
- │ --summarization --no-summarization Enable automatic fact extraction
1838
- and summaries.
2208
+ │ --summarization --no-summarization Extract facts and generate
2209
+ summaries after each turn using
2210
+ │ the LLM. Disable to only store │
2211
+ │ raw conversation turns. │
1839
2212
  │ [default: summarization] │
1840
- │ --git-versioning --no-git-versioning Enable automatic git commit of
1841
- memory changes.
2213
+ │ --git-versioning --no-git-versioning Auto-commit memory changes to
2214
+ git. Initializes a repo in
2215
+ │ --memory-path if needed. Provides │
2216
+ │ full history of memory evolution. │
1842
2217
  │ [default: git-versioning] │
1843
2218
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1844
2219
  ╭─ LLM: OpenAI-compatible ───────────────────────────────────────────────────────────────╮
@@ -1860,11 +2235,13 @@ The `memory proxy` command is the core feature—a middleware server that gives
1860
2235
  │ [default: 8100] │
1861
2236
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1862
2237
  ╭─ General Options ──────────────────────────────────────────────────────────────────────╮
1863
- │ --log-level TEXT Set logging level.
1864
- [default: INFO]
1865
- --config TEXT Path to a TOML configuration file.
1866
- │ --print-args Print the command line arguments, including variables taken
1867
- from the configuration file.
2238
+ │ --log-level [debug|info|warning|error] Set logging level.
2239
+ [env var: LOG_LEVEL]
2240
+ [default: info]
2241
+ │ --config TEXT Path to a TOML configuration file.
2242
+ --print-args Print the command line arguments,
2243
+ │ including variables taken from the │
2244
+ │ configuration file. │
1868
2245
  ╰────────────────────────────────────────────────────────────────────────────────────────╯
1869
2246
 
1870
2247
  ```
@@ -1949,12 +2326,16 @@ agent-cli memory add -c work "Project deadline is Friday"
1949
2326
  │ for stdin. Supports JSON array, │
1950
2327
  │ JSON object with 'memories' key, │
1951
2328
  │ or plain text (one per line). │
1952
- │ --conversation-id -c TEXT Conversation ID to add memories
1953
- to.
2329
+ │ --conversation-id -c TEXT Conversation namespace for these
2330
+ memories. Memories are retrieved
2331
+ │ per-conversation unless shared │
2332
+ │ globally. │
1954
2333
  │ [default: default] │
1955
- │ --memory-path PATH Path to the memory store.
2334
+ │ --memory-path PATH Directory for memory storage (same
2335
+ │ as memory proxy --memory-path). │
1956
2336
  │ [default: ./memory_db] │
1957
- │ --git-versioning --no-git-versioning Commit changes to git.
2337
+ │ --git-versioning --no-git-versioning Auto-commit changes to git for
2338
+ │ version history. │
1958
2339
  │ [default: git-versioning] │
1959
2340
  │ --help -h Show this message and exit. │
1960
2341
  ╰────────────────────────────────────────────────────────────────────────────────────────╯