ovos-gui-api-client 0.0.2a1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,222 @@
1
+ Metadata-Version: 2.4
2
+ Name: ovos-gui-api-client
3
+ Version: 0.0.2a1
4
+ Summary: Public client interface for OVOS skills and plugins to drive the GUI layer
5
+ Author-email: OpenVoiceOS <jarbasai@mailfence.com>
6
+ License: Apache-2.0
7
+ Project-URL: Homepage, https://github.com/OpenVoiceOS/ovos-gui-api-client
8
+ Project-URL: Repository, https://github.com/OpenVoiceOS/ovos-gui-api-client
9
+ Project-URL: Bug Tracker, https://github.com/OpenVoiceOS/ovos-gui-api-client/issues
10
+ Keywords: ovos,mycroft,gui,client,skill
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: License :: OSI Approved :: Apache Software License
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
20
+ Requires-Python: >=3.10
21
+ Description-Content-Type: text/markdown
22
+ Requires-Dist: ovos-config
23
+ Requires-Dist: ovos-utils
24
+ Requires-Dist: ovos-bus-client<3.0.0,>=1.0.0
25
+ Provides-Extra: test
26
+ Requires-Dist: pytest>=7.0; extra == "test"
27
+ Requires-Dist: pytest-cov; extra == "test"
28
+
29
+ # OVOS GUI — Page Template Design
30
+
31
+ ## Philosophy
32
+
33
+ A template defines **what kind of information** is being presented, not how it looks.
34
+ The display layer (Qt/web/terminal/etc.) owns all rendering decisions.
35
+ Skills own only the semantic data they provide.
36
+
37
+ ### Voice-first constraint
38
+
39
+ OVOS is a voice-first platform. The GUI is a *companion* to speech, not a
40
+ replacement for it.
41
+
42
+ - **Touch is a shortcut, never the only path.** Every interaction a user can
43
+ perform by touch must also be completable by voice.
44
+ - **Some clients are display-only** (no touch, no keyboard). Templates must
45
+ never assume input capability.
46
+ - **Skills must never block on a GUI event.** If a skill asks a question, it
47
+ listens for the spoken answer simultaneously. A GUI touch shortcut just
48
+ fires the same bus message the spoken answer would.
49
+
50
+ A template should be added when:
51
+ - The semantic structure is meaningfully distinct from existing templates, **and**
52
+ - It is needed by more than one unrelated skill (no single-skill templates), **or**
53
+ - It is an essential skill (weather, clock)
54
+
55
+ A template should be rejected when:
56
+ - It is a visual variation of an existing template (that is the display layer's job)
57
+ - It can be composed from existing templates in sequence
58
+ - Its data model is a strict subset of a broader template
59
+
60
+ ---
61
+
62
+ ## Proposed Template Set
63
+
64
+ ### System group — managed by ovos-gui, not skills
65
+
66
+ | Template | Purpose | Key session data |
67
+ |---|---|---|
68
+ | `SYSTEM_idle` | Resting / ambient screen | *(none — display layer decides)* |
69
+ | `SYSTEM_loading` | Indeterminate progress | `label: str` |
70
+ | `SYSTEM_status` | Terminal success / failure | `success: bool`, `label: str` |
71
+ | `SYSTEM_error` | Error with optional detail | `label: str`, `detail: str?` |
72
+
73
+ `SYSTEM_idle` is **reserved** — skills must not display it directly.
74
+
75
+ ---
76
+
77
+ ### Content group — read-only information
78
+
79
+ | Template | Purpose | Key session data |
80
+ |---|---|---|
81
+ | `SYSTEM_text` | Long-form text, auto-paginated | `text: str`, `title: str?` |
82
+ | `SYSTEM_image` | Static image | `image: url`, `title: str?`, `caption: str?`, `fill: FillMode?`, `background_color: str?` |
83
+ | `SYSTEM_animated_image` | Animated image (GIF / WebP) | same as `SYSTEM_image` |
84
+ | `SYSTEM_list` | Scrollable list of labelled items | `items: List[{title, subtitle?, image?}]`, `title: str?` |
85
+ | `SYSTEM_grid` | 2-D tile grid of image-primary items | `items: List[{image, title?}]`, `title: str?` |
86
+ | `SYSTEM_table` | Columnar data table with named headers | `columns: List[str]`, `rows: List[List[Any]]`, `title: str?` |
87
+ | `SYSTEM_html` | Rendered HTML string | `html: str`, `resource_url: str?` |
88
+ | `SYSTEM_url` | Full web page | `url: str` |
89
+
90
+ **Why three distinct collection templates?**
91
+
92
+ The data models are semantically different, not just visually different:
93
+
94
+ | | Primary axis | Item structure | Layout authority |
95
+ |---|---|---|---|
96
+ | `list` | Reading order | Hierarchical — title + subtitle + thumbnail | Fixed: single column |
97
+ | `grid` | Visual equality | Image-primary — image required, title optional | Display layer (adapts to screen) |
98
+ | `table` | Column identity | Relational — named columns, typed rows | Fixed: column × row |
99
+
100
+ Merging them would force every display layer to infer intent from data shape,
101
+ which is fragile. Keeping them separate makes the skill's intent explicit.
102
+
103
+ ---
104
+
105
+ ### Media group — time-based playback
106
+
107
+ | Template | Purpose | Key session data |
108
+ |---|---|---|
109
+ | `SYSTEM_audio_player` | Now-playing card (audio) | `title: str`, `artist: str?`, `album: str?`, `image: url?`, `position: float`, `duration: float`, `playing: bool` |
110
+ | `SYSTEM_video_player` | In-GUI video playback | `uri: str`, `title: str?`, `playing: bool` |
111
+
112
+ **Why separate audio and video?**
113
+ Audio playback shows a rich *metadata card* while the actual audio plays through
114
+ the sound system — there is no video stream involved. Video playback is a raw
115
+ media surface. Conflating them forces every display layer to handle both cases
116
+ in one template.
117
+
118
+ ---
119
+
120
+ ### Utility group — common single-purpose views
121
+
122
+ | Template | Purpose | Key session data |
123
+ |---|---|---|
124
+ | `SYSTEM_clock` | Current time display | *(self-updating, no data needed)* |
125
+ | `SYSTEM_timer` | Countdown / count-up display | `duration: int` (seconds), `label: str?` |
126
+ | `SYSTEM_weather` | Weather summary card | `current_temp`, `min_temp`, `max_temp`, `condition: str`, `icon: url?`, `location: str?` |
127
+ | `SYSTEM_map` | Geographic location | `latitude: float`, `longitude: float`, `zoom: int?`, `label: str?` |
128
+
129
+ **Why `SYSTEM_timer`?**
130
+ Timers are pervasive (cooking, pomodoro, alarms) and have a distinct real-time
131
+ countdown UI that cannot be expressed by `SYSTEM_text` without the skill
132
+ manually updating the displayed string every second.
133
+
134
+ **Why `SYSTEM_map`?**
135
+ Navigation, weather location, business lookup — all need a spatial view.
136
+ A URL-embedded map is fragile (requires network, leaks provider choice).
137
+
138
+ ---
139
+
140
+ ### Dialogue group — visual accompaniment to an active voice dialogue
141
+
142
+ OVOS is **voice-first**. The GUI never drives an interaction on its own.
143
+ These templates display what is currently being asked through speech so the
144
+ user can follow along. On capable devices a touch shortcut may be offered as
145
+ a convenience, but:
146
+
147
+ - Voice is always the primary (and on display-only devices, the only) path.
148
+ - A skill must never block waiting for a GUI event — it must always be
149
+ listening for the spoken answer in parallel.
150
+ - The display layer decides whether to render touch targets at all.
151
+
152
+ | Template | Purpose | Key session data |
153
+ |---|---|---|
154
+ | `SYSTEM_confirm` | Shows the yes/no question being spoken | `question: str` |
155
+ | `SYSTEM_select` | Shows the list of options being spoken | `prompt: str?`, `items: List[{label, value}]` |
156
+
157
+ `SYSTEM_input` is **excluded** — free-text keyboard entry is not a voice-first
158
+ interaction and cannot work on display-only devices. Skills that need text
159
+ input must use the speech layer.
160
+
161
+ ---
162
+
163
+ ### Avatar group — embodied assistant states
164
+
165
+ | Template | Purpose | Key session data |
166
+ |---|---|---|
167
+ | `SYSTEM_face` | Animated avatar face | `sleeping: bool` |
168
+
169
+ This template is intentionally minimal — the display layer decides what the
170
+ avatar looks like. Additional emotional states (thinking, listening) are a
171
+ display-layer concern, not a session-data concern.
172
+
173
+ ---
174
+
175
+ ## Templates deliberately excluded
176
+
177
+ | Rejected | Reason |
178
+ |---|---|
179
+ | `SYSTEM_slideshow` | Compose by showing `SYSTEM_image` pages at `index` |
180
+ | `SYSTEM_notification` | This is a system-layer concept, not a page template |
181
+ | `SYSTEM_qr_code` | Renderable as `SYSTEM_image`; generation belongs in the skill |
182
+ | `SYSTEM_chart` / `SYSTEM_graph` | Too display-layer-specific; use `SYSTEM_html` or `SYSTEM_image` |
183
+ | `SYSTEM_input` | Free-text keyboard entry breaks the voice-first contract and cannot work on display-only devices |
184
+ | Per-skill templates (calendar, spotify, etc.) | Violates the "multiple unrelated skills" rule |
185
+
186
+ ---
187
+
188
+ ## Summary
189
+
190
+ ```
191
+ SYSTEM_idle (reserved)
192
+
193
+ SYSTEM_loading
194
+ SYSTEM_status
195
+ SYSTEM_error
196
+
197
+ SYSTEM_text
198
+ SYSTEM_image
199
+ SYSTEM_animated_image
200
+ SYSTEM_list
201
+ SYSTEM_grid
202
+ SYSTEM_table
203
+
204
+ SYSTEM_html
205
+ SYSTEM_url
206
+
207
+ SYSTEM_audio_player
208
+ SYSTEM_video_player
209
+
210
+ SYSTEM_clock
211
+ SYSTEM_timer
212
+ SYSTEM_weather
213
+ SYSTEM_map
214
+
215
+ SYSTEM_confirm
216
+ SYSTEM_select
217
+
218
+ SYSTEM_face
219
+ ```
220
+
221
+ 21 templates total. Every one is justified by cross-skill usage and a data
222
+ model that cannot be reduced to an existing template.
@@ -0,0 +1,194 @@
1
+ # OVOS GUI — Page Template Design
2
+
3
+ ## Philosophy
4
+
5
+ A template defines **what kind of information** is being presented, not how it looks.
6
+ The display layer (Qt/web/terminal/etc.) owns all rendering decisions.
7
+ Skills own only the semantic data they provide.
8
+
9
+ ### Voice-first constraint
10
+
11
+ OVOS is a voice-first platform. The GUI is a *companion* to speech, not a
12
+ replacement for it.
13
+
14
+ - **Touch is a shortcut, never the only path.** Every interaction a user can
15
+ perform by touch must also be completable by voice.
16
+ - **Some clients are display-only** (no touch, no keyboard). Templates must
17
+ never assume input capability.
18
+ - **Skills must never block on a GUI event.** If a skill asks a question, it
19
+ listens for the spoken answer simultaneously. A GUI touch shortcut just
20
+ fires the same bus message the spoken answer would.
21
+
22
+ A template should be added when:
23
+ - The semantic structure is meaningfully distinct from existing templates, **and**
24
+ - It is needed by more than one unrelated skill (no single-skill templates), **or**
25
+ - It is an essential skill (weather, clock)
26
+
27
+ A template should be rejected when:
28
+ - It is a visual variation of an existing template (that is the display layer's job)
29
+ - It can be composed from existing templates in sequence
30
+ - Its data model is a strict subset of a broader template
31
+
32
+ ---
33
+
34
+ ## Proposed Template Set
35
+
36
+ ### System group — managed by ovos-gui, not skills
37
+
38
+ | Template | Purpose | Key session data |
39
+ |---|---|---|
40
+ | `SYSTEM_idle` | Resting / ambient screen | *(none — display layer decides)* |
41
+ | `SYSTEM_loading` | Indeterminate progress | `label: str` |
42
+ | `SYSTEM_status` | Terminal success / failure | `success: bool`, `label: str` |
43
+ | `SYSTEM_error` | Error with optional detail | `label: str`, `detail: str?` |
44
+
45
+ `SYSTEM_idle` is **reserved** — skills must not display it directly.
46
+
47
+ ---
48
+
49
+ ### Content group — read-only information
50
+
51
+ | Template | Purpose | Key session data |
52
+ |---|---|---|
53
+ | `SYSTEM_text` | Long-form text, auto-paginated | `text: str`, `title: str?` |
54
+ | `SYSTEM_image` | Static image | `image: url`, `title: str?`, `caption: str?`, `fill: FillMode?`, `background_color: str?` |
55
+ | `SYSTEM_animated_image` | Animated image (GIF / WebP) | same as `SYSTEM_image` |
56
+ | `SYSTEM_list` | Scrollable list of labelled items | `items: List[{title, subtitle?, image?}]`, `title: str?` |
57
+ | `SYSTEM_grid` | 2-D tile grid of image-primary items | `items: List[{image, title?}]`, `title: str?` |
58
+ | `SYSTEM_table` | Columnar data table with named headers | `columns: List[str]`, `rows: List[List[Any]]`, `title: str?` |
59
+ | `SYSTEM_html` | Rendered HTML string | `html: str`, `resource_url: str?` |
60
+ | `SYSTEM_url` | Full web page | `url: str` |
61
+
62
+ **Why three distinct collection templates?**
63
+
64
+ The data models are semantically different, not just visually different:
65
+
66
+ | | Primary axis | Item structure | Layout authority |
67
+ |---|---|---|---|
68
+ | `list` | Reading order | Hierarchical — title + subtitle + thumbnail | Fixed: single column |
69
+ | `grid` | Visual equality | Image-primary — image required, title optional | Display layer (adapts to screen) |
70
+ | `table` | Column identity | Relational — named columns, typed rows | Fixed: column × row |
71
+
72
+ Merging them would force every display layer to infer intent from data shape,
73
+ which is fragile. Keeping them separate makes the skill's intent explicit.
74
+
75
+ ---
76
+
77
+ ### Media group — time-based playback
78
+
79
+ | Template | Purpose | Key session data |
80
+ |---|---|---|
81
+ | `SYSTEM_audio_player` | Now-playing card (audio) | `title: str`, `artist: str?`, `album: str?`, `image: url?`, `position: float`, `duration: float`, `playing: bool` |
82
+ | `SYSTEM_video_player` | In-GUI video playback | `uri: str`, `title: str?`, `playing: bool` |
83
+
84
+ **Why separate audio and video?**
85
+ Audio playback shows a rich *metadata card* while the actual audio plays through
86
+ the sound system — there is no video stream involved. Video playback is a raw
87
+ media surface. Conflating them forces every display layer to handle both cases
88
+ in one template.
89
+
90
+ ---
91
+
92
+ ### Utility group — common single-purpose views
93
+
94
+ | Template | Purpose | Key session data |
95
+ |---|---|---|
96
+ | `SYSTEM_clock` | Current time display | *(self-updating, no data needed)* |
97
+ | `SYSTEM_timer` | Countdown / count-up display | `duration: int` (seconds), `label: str?` |
98
+ | `SYSTEM_weather` | Weather summary card | `current_temp`, `min_temp`, `max_temp`, `condition: str`, `icon: url?`, `location: str?` |
99
+ | `SYSTEM_map` | Geographic location | `latitude: float`, `longitude: float`, `zoom: int?`, `label: str?` |
100
+
101
+ **Why `SYSTEM_timer`?**
102
+ Timers are pervasive (cooking, pomodoro, alarms) and have a distinct real-time
103
+ countdown UI that cannot be expressed by `SYSTEM_text` without the skill
104
+ manually updating the displayed string every second.
105
+
106
+ **Why `SYSTEM_map`?**
107
+ Navigation, weather location, business lookup — all need a spatial view.
108
+ A URL-embedded map is fragile (requires network, leaks provider choice).
109
+
110
+ ---
111
+
112
+ ### Dialogue group — visual accompaniment to an active voice dialogue
113
+
114
+ OVOS is **voice-first**. The GUI never drives an interaction on its own.
115
+ These templates display what is currently being asked through speech so the
116
+ user can follow along. On capable devices a touch shortcut may be offered as
117
+ a convenience, but:
118
+
119
+ - Voice is always the primary (and on display-only devices, the only) path.
120
+ - A skill must never block waiting for a GUI event — it must always be
121
+ listening for the spoken answer in parallel.
122
+ - The display layer decides whether to render touch targets at all.
123
+
124
+ | Template | Purpose | Key session data |
125
+ |---|---|---|
126
+ | `SYSTEM_confirm` | Shows the yes/no question being spoken | `question: str` |
127
+ | `SYSTEM_select` | Shows the list of options being spoken | `prompt: str?`, `items: List[{label, value}]` |
128
+
129
+ `SYSTEM_input` is **excluded** — free-text keyboard entry is not a voice-first
130
+ interaction and cannot work on display-only devices. Skills that need text
131
+ input must use the speech layer.
132
+
133
+ ---
134
+
135
+ ### Avatar group — embodied assistant states
136
+
137
+ | Template | Purpose | Key session data |
138
+ |---|---|---|
139
+ | `SYSTEM_face` | Animated avatar face | `sleeping: bool` |
140
+
141
+ This template is intentionally minimal — the display layer decides what the
142
+ avatar looks like. Additional emotional states (thinking, listening) are a
143
+ display-layer concern, not a session-data concern.
144
+
145
+ ---
146
+
147
+ ## Templates deliberately excluded
148
+
149
+ | Rejected | Reason |
150
+ |---|---|
151
+ | `SYSTEM_slideshow` | Compose by showing `SYSTEM_image` pages at `index` |
152
+ | `SYSTEM_notification` | This is a system-layer concept, not a page template |
153
+ | `SYSTEM_qr_code` | Renderable as `SYSTEM_image`; generation belongs in the skill |
154
+ | `SYSTEM_chart` / `SYSTEM_graph` | Too display-layer-specific; use `SYSTEM_html` or `SYSTEM_image` |
155
+ | `SYSTEM_input` | Free-text keyboard entry breaks the voice-first contract and cannot work on display-only devices |
156
+ | Per-skill templates (calendar, spotify, etc.) | Violates the "multiple unrelated skills" rule |
157
+
158
+ ---
159
+
160
+ ## Summary
161
+
162
+ ```
163
+ SYSTEM_idle (reserved)
164
+
165
+ SYSTEM_loading
166
+ SYSTEM_status
167
+ SYSTEM_error
168
+
169
+ SYSTEM_text
170
+ SYSTEM_image
171
+ SYSTEM_animated_image
172
+ SYSTEM_list
173
+ SYSTEM_grid
174
+ SYSTEM_table
175
+
176
+ SYSTEM_html
177
+ SYSTEM_url
178
+
179
+ SYSTEM_audio_player
180
+ SYSTEM_video_player
181
+
182
+ SYSTEM_clock
183
+ SYSTEM_timer
184
+ SYSTEM_weather
185
+ SYSTEM_map
186
+
187
+ SYSTEM_confirm
188
+ SYSTEM_select
189
+
190
+ SYSTEM_face
191
+ ```
192
+
193
+ 21 templates total. Every one is justified by cross-skill usage and a data
194
+ model that cannot be reduced to an existing template.