speek 0.0.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
speek-0.0.2/LICENSE ADDED
@@ -0,0 +1,19 @@
1
+ Copyright (c) 2024 Dongyeop Lee
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ of this software and associated documentation files (the "Software"), to deal
5
+ in the Software without restriction, including without limitation the rights
6
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ copies of the Software, and to permit persons to whom the Software is
8
+ furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in all
11
+ copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19
+ SOFTWARE.
speek-0.0.2/PKG-INFO ADDED
@@ -0,0 +1,122 @@
1
+ Metadata-Version: 2.4
2
+ Name: speek
3
+ Version: 0.0.2
4
+ Summary: Peek into slurm's resource info such as GPU avaiability, usage per user, job status, and more.
5
+ Author-email: Dongyeop Lee <dylee23@postech.ac.kr>
6
+ License: Copyright (c) 2024 Dongyeop Lee
7
+
8
+ Permission is hereby granted, free of charge, to any person obtaining a copy
9
+ of this software and associated documentation files (the "Software"), to deal
10
+ in the Software without restriction, including without limitation the rights
11
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
12
+ copies of the Software, and to permit persons to whom the Software is
13
+ furnished to do so, subject to the following conditions:
14
+
15
+ The above copyright notice and this permission notice shall be included in all
16
+ copies or substantial portions of the Software.
17
+
18
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
19
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
20
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
21
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
22
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
23
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
24
+ SOFTWARE.
25
+ Project-URL: Repository, https://github.com/edong6768/speek.git
26
+ Keywords: slurm
27
+ Classifier: Intended Audience :: Science/Research
28
+ Classifier: Intended Audience :: System Administrators
29
+ Classifier: License :: OSI Approved :: MIT License
30
+ Classifier: Topic :: System :: Monitoring
31
+ Classifier: Topic :: Utilities
32
+ Classifier: Programming Language :: Python :: 3
33
+ Classifier: Programming Language :: Python :: 3.8
34
+ Classifier: Programming Language :: Python :: 3.9
35
+ Classifier: Programming Language :: Python :: 3.10
36
+ Classifier: Programming Language :: Python :: 3.11
37
+ Classifier: Programming Language :: Python :: 3.12
38
+ Requires-Python: >=3.8
39
+ Description-Content-Type: text/markdown
40
+ License-File: LICENSE
41
+ Requires-Dist: rich
42
+ Dynamic: license-file
43
+
44
+ # 🔍 speek
45
+
46
+ **speek** is a suite of SLURM cluster monitoring tools — from quick one-shot snapshots to a full interactive TUI.
47
+
48
+ ## Installation
49
+
50
+ ```sh
51
+ pip install speek
52
+ ```
53
+
54
+ For the latest development version:
55
+ ```sh
56
+ pip install --pre speek
57
+ ```
58
+
59
+ ## Commands
60
+
61
+ | Command | Description |
62
+ |---------|-------------|
63
+ | `speek0` | Classic one-shot cluster overview — GPU availability, per-user usage, job status |
64
+ | `speek-` | Compact snapshot — per-model GPU bars, trends, pending pressure |
65
+ | `speek+` | Full interactive TUI — queue, nodes, users, stats, events, shell |
66
+
67
+ ## speek0 — Classic Overview
68
+
69
+ ```sh
70
+ speek0 [-u USER] [-f FILE] [-t T_AVAIL]
71
+ ```
72
+
73
+ | Option | Description |
74
+ |--------|-------------|
75
+ | `-u USER` | Highlight a specific user (default: self) |
76
+ | `-f FILE` | User info CSV file |
77
+ | `-t T_AVAIL` | Time window for upcoming release, e.g. `5 m`, `1 h` |
78
+
79
+ Shows a table of GPU usage per partition, ranked users with `đŸ„‡đŸ„ˆđŸ„‰`, utilization-colored counts, and your current jobs.
80
+
81
+ ## speek- — Compact Snapshot
82
+
83
+ ```sh
84
+ speek- [-u USER]
85
+ ```
86
+
87
+ Per-GPU-model view with utilization bars, free/total counts, pending pressure (`⏾N`), availability trends (`↑↓`), and your running/pending jobs. Detects down nodes and shows them as DEAD.
88
+
89
+ ## speek+ — Interactive TUI
90
+
91
+ ```sh
92
+ speek+
93
+ ```
94
+
95
+ Full-featured Textual TUI with:
96
+
97
+ - **Cluster** — speek0-style usage table (tab 1)
98
+ - **Queue** — all cluster jobs grouped by partition, foldable
99
+ - **Nodes** — per-partition node status with usage bars
100
+ - **Users** — per-user GPU usage, fairshare, per-partition breakdown
101
+ - **Stats** — GPU usage charts, per-user stacked view, issue dashboard
102
+ - **Logs** — session CLI output (not persisted)
103
+ - **Settings** — theme, refresh rates, cache management, log scanning
104
+ - **Info** — cluster probe results, scheduling factors, error detection rules
105
+ - **Help** — keybindings reference
106
+
107
+ ### Features
108
+
109
+ - 70+ color themes (base16 standard)
110
+ - OOM and error detection (11 error types) with log scanning
111
+ - Job detail popup with stdout, stderr, GPU stats, analysis
112
+ - Built-in shell with tab completion, history, sbatch suggestions
113
+ - Per-job log hints in the table
114
+ - Event notifications with read/unread tracking
115
+ - Down node detection with DEAD indicators
116
+
117
+ ## Requirements
118
+
119
+ - Python 3.8+
120
+ - SLURM cluster with `squeue`, `scontrol`, `sinfo`
121
+ - Optional: `sacct`, `sprio`, `sshare`, `sreport`, `scancel` for full features
122
+ - `rich` (all commands), `textual>=0.50.0` (speek+ only)
speek-0.0.2/README.md ADDED
@@ -0,0 +1,79 @@
1
+ # 🔍 speek
2
+
3
+ **speek** is a suite of SLURM cluster monitoring tools — from quick one-shot snapshots to a full interactive TUI.
4
+
5
+ ## Installation
6
+
7
+ ```sh
8
+ pip install speek
9
+ ```
10
+
11
+ For the latest development version:
12
+ ```sh
13
+ pip install --pre speek
14
+ ```
15
+
16
+ ## Commands
17
+
18
+ | Command | Description |
19
+ |---------|-------------|
20
+ | `speek0` | Classic one-shot cluster overview — GPU availability, per-user usage, job status |
21
+ | `speek-` | Compact snapshot — per-model GPU bars, trends, pending pressure |
22
+ | `speek+` | Full interactive TUI — queue, nodes, users, stats, events, shell |
23
+
24
+ ## speek0 — Classic Overview
25
+
26
+ ```sh
27
+ speek0 [-u USER] [-f FILE] [-t T_AVAIL]
28
+ ```
29
+
30
+ | Option | Description |
31
+ |--------|-------------|
32
+ | `-u USER` | Highlight a specific user (default: self) |
33
+ | `-f FILE` | User info CSV file |
34
+ | `-t T_AVAIL` | Time window for upcoming release, e.g. `5 m`, `1 h` |
35
+
36
+ Shows a table of GPU usage per partition, ranked users with `đŸ„‡đŸ„ˆđŸ„‰`, utilization-colored counts, and your current jobs.
37
+
38
+ ## speek- — Compact Snapshot
39
+
40
+ ```sh
41
+ speek- [-u USER]
42
+ ```
43
+
44
+ Per-GPU-model view with utilization bars, free/total counts, pending pressure (`⏾N`), availability trends (`↑↓`), and your running/pending jobs. Detects down nodes and shows them as DEAD.
45
+
46
+ ## speek+ — Interactive TUI
47
+
48
+ ```sh
49
+ speek+
50
+ ```
51
+
52
+ Full-featured Textual TUI with:
53
+
54
+ - **Cluster** — speek0-style usage table (tab 1)
55
+ - **Queue** — all cluster jobs grouped by partition, foldable
56
+ - **Nodes** — per-partition node status with usage bars
57
+ - **Users** — per-user GPU usage, fairshare, per-partition breakdown
58
+ - **Stats** — GPU usage charts, per-user stacked view, issue dashboard
59
+ - **Logs** — session CLI output (not persisted)
60
+ - **Settings** — theme, refresh rates, cache management, log scanning
61
+ - **Info** — cluster probe results, scheduling factors, error detection rules
62
+ - **Help** — keybindings reference
63
+
64
+ ### Features
65
+
66
+ - 70+ color themes (base16 standard)
67
+ - OOM and error detection (11 error types) with log scanning
68
+ - Job detail popup with stdout, stderr, GPU stats, analysis
69
+ - Built-in shell with tab completion, history, sbatch suggestions
70
+ - Per-job log hints in the table
71
+ - Event notifications with read/unread tracking
72
+ - Down node detection with DEAD indicators
73
+
74
+ ## Requirements
75
+
76
+ - Python 3.8+
77
+ - SLURM cluster with `squeue`, `scontrol`, `sinfo`
78
+ - Optional: `sacct`, `sprio`, `sshare`, `sreport`, `scancel` for full features
79
+ - `rich` (all commands), `textual>=0.50.0` (speek+ only)
@@ -0,0 +1,40 @@
1
+ [build-system]
2
+ requires = ["setuptools"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [tool.setuptools]
6
+ packages = ["speek"]
7
+
8
+ [project]
9
+ name = "speek"
10
+ version = "0.0.2"
11
+ description = "Peek into slurm's resource info such as GPU avaiability, usage per user, job status, and more."
12
+ readme = "README.md"
13
+ requires-python = ">=3.8"
14
+ license = {file = "LICENSE"}
15
+ keywords = ["slurm"]
16
+ authors = [
17
+ {name = "Dongyeop Lee", email = "dylee23@postech.ac.kr"},
18
+ ]
19
+ classifiers = [
20
+ "Intended Audience :: Science/Research",
21
+ "Intended Audience :: System Administrators",
22
+ "License :: OSI Approved :: MIT License",
23
+ "Topic :: System :: Monitoring",
24
+ "Topic :: Utilities",
25
+ "Programming Language :: Python :: 3",
26
+ "Programming Language :: Python :: 3.8",
27
+ "Programming Language :: Python :: 3.9",
28
+ "Programming Language :: Python :: 3.10",
29
+ "Programming Language :: Python :: 3.11",
30
+ "Programming Language :: Python :: 3.12",
31
+ ]
32
+ dependencies = [
33
+ 'rich',
34
+ ]
35
+
36
+ [project.scripts]
37
+ speek = "speek.check_slurm_resource:main"
38
+
39
+ [project.urls]
40
+ Repository = "https://github.com/edong6768/speek.git"
speek-0.0.2/setup.cfg ADDED
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,328 @@
1
+ import subprocess
2
+ from glob import glob
3
+ import csv
4
+ import re
5
+
6
+ import argparse
7
+ from datetime import datetime, timedelta
8
+
9
+ from rich import print
10
+ from rich.table import Table
11
+ from rich.align import Align
12
+ from rich.live import Live
13
+ from rich.console import Group
14
+
15
+ parser = argparse.ArgumentParser(description="Peek into slurm resource info.")
16
+ parser.add_argument('-u', '--user', default=None, type=str, help='Specify highlighted user.')
17
+
18
+ parser.add_argument('-l', '--live', action='store_true', help='Live display of speek every 1 seconds.')
19
+ parser.set_defaults(live=False)
20
+
21
+ parser.add_argument('-f', '--file', default='auto', type=str, help='Specify file for user info.')
22
+ parser.add_argument('-t', '--t_avail', default='5 m', type=str, help='Time window width for upcomming release in {m:minutes, h:hours, d:days}. (default: 5 m)')
23
+ args = parser.parse_args()
24
+
25
+
26
+ def get_scontrol_dict(unit):
27
+ assert unit in ['Job', 'Partition', 'Node']
28
+
29
+ scontrol_str = subprocess.check_output(['scontrol', 'show', unit]).decode('utf-8').replace(' ', '\n')
30
+
31
+ scontrols = {}
32
+ delimiter = f'{unit}Name=' if unit != 'Job' else 'JobId='
33
+ for scontrol in scontrol_str.split(delimiter):
34
+ if not scontrol: continue
35
+ n, *infos = [i for i in scontrol.split('\n') if i]
36
+ if unit == 'Job': n = int(n) if n!='No' else 0
37
+
38
+ scontrols[n] = {}
39
+ for info in infos:
40
+ if '=' not in info:
41
+ scontrols[n][info] = None
42
+ continue
43
+ k, v = info.split('=', 1)
44
+ if ',' not in v or '[' in v:
45
+ scontrols[n][k] = v
46
+ elif '=' in v:
47
+ scontrols[n][k] = dict([i.split('=') for i in v.split(',')])
48
+ else:
49
+ scontrols[n][k] = tuple(v.split(','))
50
+ return scontrols
51
+
52
+ def td_parse(s):
53
+ dt = datetime.strptime(s, '%d-%H:%M:%S') if '-' in s else datetime.strptime(s, '%H:%M:%S')
54
+ return timedelta(days=dt.day, hours=dt.hour, minutes=dt.minute, seconds=dt.second)
55
+
56
+
57
+ def consecutor(lst):
58
+ assert all([isinstance(i, (int, float)) for i in lst]), 'List should be all numbers.'
59
+ lst.sort()
60
+ if len(lst)==0: return ''
61
+ pi, *ll = lst
62
+ cl = [[pi]]
63
+ for i in ll:
64
+ if i-pi>1: cl.append([i])
65
+ else: cl[-1].append(i)
66
+ pi = i
67
+ l_str = ' '.join([f'{{{c[0]}..{c[-1]}}}' if len(c)>1 else f'{c[0]}' for c in cl])
68
+ return l_str
69
+
70
+
71
+ def get_slurm_resource():
72
+ ##############################################
73
+ # get user info #
74
+ ##############################################
75
+
76
+ # who am I
77
+ me = args.user
78
+ if me==None:
79
+ me = subprocess.check_output(['whoami']).decode('utf-8').strip()
80
+
81
+ # who are they
82
+ paths = glob(args.file)
83
+
84
+ if paths:
85
+ with open(paths[0], 'r', newline='', encoding='utf-8') as f:
86
+ reader = csv.reader(f)
87
+ header, *users = list(reader)
88
+
89
+ user_info = [dict(zip(header, user)) for user in users]
90
+ user_lookup = {}
91
+ for user in user_info:
92
+ if not user['name']: continue
93
+ user_lookup[user['user']] = f"{user['name']} ({user['affiliation'].split('-')[0][:2]} {user['title']}, {user['user']})"
94
+ else:
95
+ user_lookup = {}
96
+
97
+
98
+ ##############################################
99
+ # get gpu status #
100
+ ##############################################
101
+
102
+ partitions, jobs = map(get_scontrol_dict, ('Partition', 'Job'))
103
+
104
+ gres_names = ['GRES/gpu', 'gres/gpu']
105
+
106
+ for gres in gres_names:
107
+ if gres in partitions[[*partitions.keys()][0]]['TRESBillingWeights']:
108
+ break
109
+
110
+ # partitions = {k: v for k, v in partitions.items() if 'cpu' not in k}
111
+
112
+ status = {'PENDING', 'RUNNING'}
113
+ resource = {'Available', 'Total', 'Usage', 'max_user'}
114
+ release = {'Time left', 'count', 'user'}
115
+
116
+ NewState = lambda fields: {k: 0 for k in fields}
117
+
118
+ user_status, gpu_resource = {}, NewState(resource)
119
+ user_job_status = {}
120
+
121
+ current_time = datetime.now()
122
+
123
+ td_str = {'m':'minutes', 'h':'hours', 'd':'days'}
124
+ t_width, t_unit = args.t_avail.split()
125
+ tw = timedelta(**{td_str[t_unit]: int(t_width)})
126
+
127
+ if jobs:
128
+ for id, job in jobs.items():
129
+ j_status = job.get('JobState', None)
130
+
131
+ if j_status in status:
132
+ job_name = job['JobName']
133
+ user, gpu = job['UserId'].split('(')[0].strip(), job['Partition']
134
+ gpu_count = int(re.split(':|=', job.get('TresPerNode', 'gres:gpu:0'))[-1])
135
+
136
+ if isinstance(gpu, tuple):
137
+ gpu = tuple(sorted(gpu, key=lambda x: float(partitions[x]['TRESBillingWeights'][gres]), reverse=True))
138
+ gpu_one = gpu[0]
139
+ else:
140
+ gpu_one = gpu
141
+
142
+ if partitions[gpu_one]['TRESBillingWeights'][gres]=='0': continue
143
+
144
+ # user status
145
+ u_stat = user_status.get(user, NewState(status))
146
+ u_stat[gpu_one] = u_stat.get(gpu, NewState(status))
147
+
148
+ u_stat[j_status] += gpu_count
149
+ u_stat[gpu_one][j_status] += gpu_count
150
+
151
+ user_status[user] = u_stat
152
+
153
+ uj_stat = user_job_status.get(user, {})
154
+ uj_stat[job_name] = uj_stat.get(job_name, {})
155
+
156
+ uj_stat[job_name][gpu] = uj_stat[job_name].get(gpu, {s:[] for s in status})
157
+ uj_stat[job_name][gpu][j_status].append((id, gpu_count))
158
+
159
+ user_job_status[user] = uj_stat
160
+
161
+ # gpu status
162
+ gpu_resource[gpu] = gpu_resource.get(gpu, NewState(resource))
163
+
164
+ if j_status=='RUNNING':
165
+ gpu_resource['Available'] -= gpu_count
166
+ gpu_resource[gpu]['Available'] -= gpu_count
167
+
168
+ time_left = {'td': td_parse(job['TimeLimit'])- td_parse(job['RunTime']),
169
+ 'count': gpu_count, 'user': user}
170
+
171
+ up_re = gpu_resource[gpu].get('Upcomming release', [time_left, [time_left]])
172
+ up_re[0] = min(time_left, up_re[0], key=lambda x: x['td'])
173
+
174
+ up_re[1].append(time_left)
175
+ up_re[1] = [t for t in up_re[1] if t['td']-up_re[0]['td']<tw]
176
+
177
+ up_re[0]['total_count'] = sum([t['count'] for t in up_re[1]])
178
+ td = up_re[0]['td']
179
+ up_re[0]['str'] = (f'{td.days}-' if td.days else '') + f"{str(td).split(', ')[-1][:-3]} ({up_re[0]['total_count']})"
180
+
181
+ gpu_resource[gpu]['Upcomming release'] = up_re
182
+
183
+
184
+ for gpu, info in partitions.items():
185
+ if info['TRESBillingWeights'][gres]=='0': continue
186
+ count = int(info['TRES']['gres/gpu'])
187
+
188
+ gpu_resource[gpu] = gpu_resource.get(gpu, NewState(resource))
189
+
190
+ for s in ['Available', 'Total']:
191
+ gpu_resource[s] += count
192
+ gpu_resource[gpu][s] += count
193
+
194
+ gpu_resource['Usage'] = f"{(gpu_resource['Total'] - gpu_resource['Available'])/gpu_resource['Total']*100:.2f}%"
195
+ gpu_resource[gpu]['Usage'] = f"{(gpu_resource[gpu]['Total'] - gpu_resource[gpu]['Available'])/gpu_resource[gpu]['Total']*100:.2f}%"
196
+
197
+ for s in status:
198
+ max_user = max(user_status.items(), key=lambda x: x[1].get(gpu, NewState(status))[s]) if user_status else (None, NewState(status))
199
+ gpu_resource[gpu][f'max_{s}_user'] = max_user[0] if max_user[1].get(gpu, NewState(status))[s] else None
200
+
201
+ ####################################################
202
+ # print usage table #
203
+ ####################################################
204
+
205
+ tables = []
206
+
207
+ ranking = {0:'đŸ„‡', 1:'đŸ„ˆ', 2:'đŸ„‰'}
208
+ get_state = lambda p: ('☠ ' if p==100 else 'đŸ”„' if p>90 else 'đŸ–ïž ' if p==0 else '❄ ' if p<10 else '') #+' '
209
+ pareto = 'đŸš©'
210
+ king = {'RUNNING':'👑', 'PENDING':'⏳'}
211
+
212
+ table1 = Table(title="Cluster Usage")
213
+
214
+ # add columns
215
+ partitions_list = [p for p in sorted({*partitions.keys()} - resource) if partitions[p]['TRESBillingWeights'][gres]!='0']
216
+ partitions_list = sorted(partitions_list, key=lambda x: gpu_resource[x]['Total']*float(partitions[x]['TRESBillingWeights'][gres]), reverse=True)
217
+ table1.add_column("User")
218
+ for i, p in enumerate(partitions_list):
219
+ table1.add_column(get_state(float(gpu_resource[p]['Usage'][:-1])) + p, justify="right")
220
+ table1.add_column("Total", justify="right")
221
+
222
+
223
+ # add rows
224
+ for f in ['Available', 'Total', 'Usage']:
225
+ table1.add_row(f, *[str(gpu_resource[p][f]) for p in partitions_list], str(gpu_resource[f]))
226
+ table1.add_row(f'Until release (~{t_width}{td_str[t_unit][0]})', *[gpu_resource[p].get('Upcomming release', [{}])[0].get('str', '') for p in partitions_list], '', end_section=True)
227
+
228
+ user_status_sorted = sorted(user_status.items(), key=lambda x: (x[1]['RUNNING'], x[1]['PENDING']), reverse=True)
229
+ agg_running = 0
230
+ for i, (user, info) in enumerate(user_status_sorted):
231
+ all_running = v if agg_running<(v:=gpu_resource['Total']-gpu_resource['Available'])*0.8 else float('inf')
232
+ agg_running += info['RUNNING']
233
+
234
+ style="on bright_black" if i%2 else ""
235
+ if user==me:
236
+ style="black on bright_green"
237
+
238
+ me_section = (me in {user, user_status_sorted[min(i+1, len(user_status_sorted)-1)][0]})
239
+
240
+ rank = ranking.get(i, i+1 if agg_running<all_running*0.8 else pareto)
241
+
242
+ user_true = user_lookup.get(user, user)
243
+ state_str = lambda state: (f"{v}" if (v:=state['RUNNING']) else '') + (f"({v})" if (v:=state['PENDING']) else '')
244
+ king_str = lambda p: ''.join([king[s] for s in sorted(status) if user==gpu_resource[p][f'max_{s}_user']])
245
+ table1.add_row(f'{rank:>2}. {user_true}', *[king_str(p)+state_str(info.get(p, NewState(status))) for p in partitions_list], state_str(info), style=style, end_section=me_section)
246
+
247
+ tables.append(' \n')
248
+ tables.append(Align(table1, align='center'))
249
+ # print(' \n ')
250
+ # print(Align(table1, align='center'))
251
+
252
+
253
+ ##################################################
254
+ # print job table #
255
+ ##################################################
256
+
257
+ jobs = user_job_status.get(me, {})
258
+
259
+ if jobs:
260
+ table2 = Table(title=f"{user_lookup.get(me, me)}'s Job Status")
261
+
262
+ for c in ['Status', 'Job', 'GPU', '#', 'ids']:
263
+ table2.add_column(c)
264
+ for s in sorted(status, reverse=True):
265
+ jobs_f = {k: {jn: j for jn, j in v.items() if j[s]} for k, v in jobs.items() if any(j[s] for j in v.values())}
266
+ for i, (job_name, job) in enumerate(jobs_f.items()):
267
+ def keykey(gpu):
268
+ if isinstance(gpu, tuple):
269
+ gpu = sorted(gpu, key=lambda x: float(partitions[x]['TRESBillingWeights'][gres]))[-1]
270
+ return gpu_resource[gpu]['Total']*float(partitions[gpu]['TRESBillingWeights'][gres])
271
+ job_sorted = sorted(job.keys(), key=keykey, reverse=True)
272
+ # job_sorted = sorted(job.keys(), key=lambda x: gpu_resource[x]['Total']*float(partitions[x]['TRESBillingWeights'][gres]), reverse=True)
273
+ for j, gpu in enumerate(job_sorted):
274
+ ids = job[gpu][s]
275
+ if isinstance(gpu, tuple):
276
+ gpu = '{' + ',\n '.join(gpu) + '}'
277
+ table2.add_row(s if i+j==0 else '', job_name if j==0 else '', gpu, str(len(ids)), consecutor([id for id, _ in ids]), end_section=((i==len(jobs_f)-1) and (j==len(job_sorted)-1)))
278
+
279
+ # print(' \n ')
280
+ # print(Align(table2, align='center'))
281
+ # print(' \n ')
282
+
283
+ tables.append(' \n ')
284
+ tables.append(Align(table2, align='center'))
285
+ tables.append(' \n ')
286
+
287
+ # table3 = Table(title="Job Status")
288
+
289
+ # table3.add_column("User")
290
+ # table3.add_column("#")
291
+ # table3.add_column("GPUs")
292
+ # table3.add_column("Status")
293
+ # table3.add_column("Status")
294
+
295
+ # user_job_status_sorted = [(user, user_job_status[user]) for user, _ in user_status_sorted]
296
+ # for i, (user, jobs) in enumerate(user_job_status_sorted):
297
+ # style="on bright_black" if i%2 else ""
298
+ # if user==me:
299
+ # style="black on bright_green"
300
+
301
+ # me_section = (me in {user, user_status_sorted[min(i+1, len(user_status_sorted)-1)][0]})
302
+
303
+ # j_gpu, j_status = [], []
304
+ # for job_name, job in jobs.items():
305
+ # j_str = lambda s: 'P' if s=='PENDING' else 'R' if s=='RUNNING' else ''
306
+ # j_gpu.append('['+', '.join([f'{k} ({" ".join([j_str(j)+str(sum([cc[1] for cc in c])) for j, c in sorted(v.items(), reverse=True)])})' for k, v in job.items()])+']')
307
+ # j_status.append(' [R ' + ', '.join([consecutor([id for id, _ in ids]) for _, v in job.items() for s, ids in v.items() if s=='RUNNING']) + ']' +
308
+ # ' [P ' + ' '.join([consecutor([id for id, _ in ids]) for _, v in job.items() for s, ids in v.items() if s=='PENDING']) + ']')
309
+ # if user==me:
310
+ # table3.add_row(user_lookup.get(user, user), str(len(jobs.items())), '\n'.join(jobs.keys()), '\n'.join(j_gpu), '\n'.join(j_status), style=style, end_section=me_section)
311
+ # else:
312
+ # table3.add_row(user_lookup.get(user, user), str(len(jobs.items())), '\n'.join(jobs.keys()), '\n'.join(j_gpu), '\n'.join(j_status), style=style, end_section=me_section)
313
+ # # table3.add_row(user_lookup.get(user, user), str(len(jobs.items())), ' / '.join(jobs.keys()), style=style, end_section=me_section)
314
+
315
+ # print(Align(table3, align='center'))
316
+
317
+ return Group(*tables)
318
+
319
+ def main():
320
+ if args.live:
321
+ with Live(get_slurm_resource(), refresh_per_second=1) as live:
322
+ while True:
323
+ live.update(get_slurm_resource())
324
+ else:
325
+ print(get_slurm_resource())
326
+
327
+ if __name__ == '__main__':
328
+ main()
@@ -0,0 +1,122 @@
1
+ Metadata-Version: 2.4
2
+ Name: speek
3
+ Version: 0.0.2
4
+ Summary: Peek into slurm's resource info such as GPU avaiability, usage per user, job status, and more.
5
+ Author-email: Dongyeop Lee <dylee23@postech.ac.kr>
6
+ License: Copyright (c) 2024 Dongyeop Lee
7
+
8
+ Permission is hereby granted, free of charge, to any person obtaining a copy
9
+ of this software and associated documentation files (the "Software"), to deal
10
+ in the Software without restriction, including without limitation the rights
11
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
12
+ copies of the Software, and to permit persons to whom the Software is
13
+ furnished to do so, subject to the following conditions:
14
+
15
+ The above copyright notice and this permission notice shall be included in all
16
+ copies or substantial portions of the Software.
17
+
18
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
19
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
20
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
21
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
22
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
23
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
24
+ SOFTWARE.
25
+ Project-URL: Repository, https://github.com/edong6768/speek.git
26
+ Keywords: slurm
27
+ Classifier: Intended Audience :: Science/Research
28
+ Classifier: Intended Audience :: System Administrators
29
+ Classifier: License :: OSI Approved :: MIT License
30
+ Classifier: Topic :: System :: Monitoring
31
+ Classifier: Topic :: Utilities
32
+ Classifier: Programming Language :: Python :: 3
33
+ Classifier: Programming Language :: Python :: 3.8
34
+ Classifier: Programming Language :: Python :: 3.9
35
+ Classifier: Programming Language :: Python :: 3.10
36
+ Classifier: Programming Language :: Python :: 3.11
37
+ Classifier: Programming Language :: Python :: 3.12
38
+ Requires-Python: >=3.8
39
+ Description-Content-Type: text/markdown
40
+ License-File: LICENSE
41
+ Requires-Dist: rich
42
+ Dynamic: license-file
43
+
44
+ # 🔍 speek
45
+
46
+ **speek** is a suite of SLURM cluster monitoring tools — from quick one-shot snapshots to a full interactive TUI.
47
+
48
+ ## Installation
49
+
50
+ ```sh
51
+ pip install speek
52
+ ```
53
+
54
+ For the latest development version:
55
+ ```sh
56
+ pip install --pre speek
57
+ ```
58
+
59
+ ## Commands
60
+
61
+ | Command | Description |
62
+ |---------|-------------|
63
+ | `speek0` | Classic one-shot cluster overview — GPU availability, per-user usage, job status |
64
+ | `speek-` | Compact snapshot — per-model GPU bars, trends, pending pressure |
65
+ | `speek+` | Full interactive TUI — queue, nodes, users, stats, events, shell |
66
+
67
+ ## speek0 — Classic Overview
68
+
69
+ ```sh
70
+ speek0 [-u USER] [-f FILE] [-t T_AVAIL]
71
+ ```
72
+
73
+ | Option | Description |
74
+ |--------|-------------|
75
+ | `-u USER` | Highlight a specific user (default: self) |
76
+ | `-f FILE` | User info CSV file |
77
+ | `-t T_AVAIL` | Time window for upcoming release, e.g. `5 m`, `1 h` |
78
+
79
+ Shows a table of GPU usage per partition, ranked users with `đŸ„‡đŸ„ˆđŸ„‰`, utilization-colored counts, and your current jobs.
80
+
81
+ ## speek- — Compact Snapshot
82
+
83
+ ```sh
84
+ speek- [-u USER]
85
+ ```
86
+
87
+ Per-GPU-model view with utilization bars, free/total counts, pending pressure (`⏾N`), availability trends (`↑↓`), and your running/pending jobs. Detects down nodes and shows them as DEAD.
88
+
89
+ ## speek+ — Interactive TUI
90
+
91
+ ```sh
92
+ speek+
93
+ ```
94
+
95
+ Full-featured Textual TUI with:
96
+
97
+ - **Cluster** — speek0-style usage table (tab 1)
98
+ - **Queue** — all cluster jobs grouped by partition, foldable
99
+ - **Nodes** — per-partition node status with usage bars
100
+ - **Users** — per-user GPU usage, fairshare, per-partition breakdown
101
+ - **Stats** — GPU usage charts, per-user stacked view, issue dashboard
102
+ - **Logs** — session CLI output (not persisted)
103
+ - **Settings** — theme, refresh rates, cache management, log scanning
104
+ - **Info** — cluster probe results, scheduling factors, error detection rules
105
+ - **Help** — keybindings reference
106
+
107
+ ### Features
108
+
109
+ - 70+ color themes (base16 standard)
110
+ - OOM and error detection (11 error types) with log scanning
111
+ - Job detail popup with stdout, stderr, GPU stats, analysis
112
+ - Built-in shell with tab completion, history, sbatch suggestions
113
+ - Per-job log hints in the table
114
+ - Event notifications with read/unread tracking
115
+ - Down node detection with DEAD indicators
116
+
117
+ ## Requirements
118
+
119
+ - Python 3.8+
120
+ - SLURM cluster with `squeue`, `scontrol`, `sinfo`
121
+ - Optional: `sacct`, `sprio`, `sshare`, `sreport`, `scancel` for full features
122
+ - `rich` (all commands), `textual>=0.50.0` (speek+ only)
@@ -0,0 +1,10 @@
1
+ LICENSE
2
+ README.md
3
+ pyproject.toml
4
+ speek/check_slurm_resource.py
5
+ speek.egg-info/PKG-INFO
6
+ speek.egg-info/SOURCES.txt
7
+ speek.egg-info/dependency_links.txt
8
+ speek.egg-info/entry_points.txt
9
+ speek.egg-info/requires.txt
10
+ speek.egg-info/top_level.txt
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ speek = speek.check_slurm_resource:main
@@ -0,0 +1 @@
1
+ rich
@@ -0,0 +1 @@
1
+ speek