guidepost 0.3.0__tar.gz → 0.3.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. guidepost-0.3.1/PKG-INFO +99 -0
  2. guidepost-0.3.1/README.md +68 -0
  3. {guidepost-0.3.0 → guidepost-0.3.1}/guidepost/guidepost.py +20 -2
  4. guidepost-0.3.1/guidepost/version.py +2 -0
  5. guidepost-0.3.1/guidepost.egg-info/PKG-INFO +99 -0
  6. {guidepost-0.3.0 → guidepost-0.3.1}/guidepost.egg-info/SOURCES.txt +1 -0
  7. guidepost-0.3.0/PKG-INFO +0 -225
  8. guidepost-0.3.0/guidepost/version.py +0 -2
  9. guidepost-0.3.0/guidepost.egg-info/PKG-INFO +0 -225
  10. {guidepost-0.3.0 → guidepost-0.3.1}/LICENSE +0 -0
  11. {guidepost-0.3.0 → guidepost-0.3.1}/guidepost/__init__.py +0 -0
  12. {guidepost-0.3.0 → guidepost-0.3.1}/guidepost/aggregation.py +0 -0
  13. {guidepost-0.3.0 → guidepost-0.3.1}/guidepost/node_layout.py +0 -0
  14. {guidepost-0.3.0 → guidepost-0.3.1}/guidepost/seriation.py +0 -0
  15. {guidepost-0.3.0 → guidepost-0.3.1}/guidepost/trailmark.py +0 -0
  16. {guidepost-0.3.0 → guidepost-0.3.1}/guidepost/utils.py +0 -0
  17. {guidepost-0.3.0 → guidepost-0.3.1}/guidepost.egg-info/dependency_links.txt +0 -0
  18. {guidepost-0.3.0 → guidepost-0.3.1}/guidepost.egg-info/requires.txt +0 -0
  19. {guidepost-0.3.0 → guidepost-0.3.1}/guidepost.egg-info/top_level.txt +0 -0
  20. {guidepost-0.3.0 → guidepost-0.3.1}/pyproject.toml +0 -0
  21. {guidepost-0.3.0 → guidepost-0.3.1}/setup.cfg +0 -0
  22. {guidepost-0.3.0 → guidepost-0.3.1}/setup.py +0 -0
  23. {guidepost-0.3.0 → guidepost-0.3.1}/tests/test_aggregation.py +0 -0
  24. {guidepost-0.3.0 → guidepost-0.3.1}/tests/test_list_parsing.py +0 -0
  25. {guidepost-0.3.0 → guidepost-0.3.1}/tests/test_node_layout.py +0 -0
  26. {guidepost-0.3.0 → guidepost-0.3.1}/tests/test_seriation.py +0 -0
  27. {guidepost-0.3.0 → guidepost-0.3.1}/tutorials/__init__.py +0 -0
@@ -0,0 +1,99 @@
1
+ Metadata-Version: 2.4
2
+ Name: guidepost
3
+ Version: 0.3.1
4
+ Summary: Guidepost. An overview visualization for understanding supercomputer queue data.
5
+ Home-page: https://github.com/cscully-allison/guidepost
6
+ Author: Connor Scully-Allison
7
+ Author-email: cscullyallison@sci.utah.edu
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.10
12
+ Description-Content-Type: text/markdown
13
+ License-File: LICENSE
14
+ Requires-Dist: numpy
15
+ Requires-Dist: pandas
16
+ Requires-Dist: scikit-learn
17
+ Requires-Dist: anywidget
18
+ Requires-Dist: traitlets
19
+ Requires-Dist: pyarrow>=14
20
+ Requires-Dist: duckdb>=0.10
21
+ Dynamic: author
22
+ Dynamic: author-email
23
+ Dynamic: classifier
24
+ Dynamic: description
25
+ Dynamic: description-content-type
26
+ Dynamic: home-page
27
+ Dynamic: license-file
28
+ Dynamic: requires-dist
29
+ Dynamic: requires-python
30
+ Dynamic: summary
31
+
32
+ # Guidepost
33
+
34
+ Guidepost is a Python library for visualizing High Performance Computing (HPC) job data in Jupyter notebooks. It turns a `pandas` DataFrame of job records into a single, linked, interactive overview — faceted heatmaps framed by histograms, a categorical bar chart, and a brushable color legend — so you can spot patterns in runtimes, queue waits, and resource usage, then export the exact records you care about back into Python.
35
+
36
+ ![Annotated Guidepost visualization showing the data grouping name, color-by-categorical bar chart, and the current selection of records for export](https://i.postimg.cc/vTDMX2b3/temp-Image-MVb5ui.avif)
37
+
38
+ ## Installation
39
+
40
+ ```bash
41
+ pip install guidepost
42
+ ```
43
+
44
+ ## Quick start
45
+
46
+ ```python
47
+ from guidepost import Guidepost
48
+ import pandas as pd
49
+
50
+ gp = Guidepost()
51
+ gp.load_data(pd.read_parquet("data/jobs_data.parquet"))
52
+
53
+ gp.vis_configs = {
54
+ 'x': 'start_time', # x-axis (numeric or datetime)
55
+ 'y': 'queue_wait', # y-axis (numeric)
56
+ 'color': 'nodes_requested', # cell color (numeric)
57
+ 'color_agg': 'avg', # aggregation for color
58
+ 'categorical': 'user', # bar chart / filter
59
+ 'facet_by': 'partition' # splits the data into groups
60
+ }
61
+
62
+ gp # display in a notebook cell
63
+ ```
64
+
65
+ Brush the heatmap or its histograms, then pull the selected rows back into Python:
66
+
67
+ ```python
68
+ df = gp.retrieve_selected_data() # or: gp.selection.dataframe
69
+ ```
70
+
71
+ Input is a `pandas` DataFrame with at least three numeric and two categorical columns (datetime columns are supported on the x-axis).
72
+
73
+ ## Documentation
74
+
75
+ Full documentation lives in the **[Guidepost Wiki](https://github.com/cscully-allison/guidepost/wiki)**:
76
+
77
+ - [Getting Started](https://github.com/cscully-allison/guidepost/wiki/Getting-Started)
78
+ - [Data Requirements and Type Detection](https://github.com/cscully-allison/guidepost/wiki/Data-Requirements-and-Type-Detection)
79
+ - [Configuration](https://github.com/cscully-allison/guidepost/wiki/Configuration)
80
+ - [Understanding the Views](https://github.com/cscully-allison/guidepost/wiki/Understanding-the-Views) — and the per-view interaction guides
81
+ - [Selecting and Exporting Data](https://github.com/cscully-allison/guidepost/wiki/Selecting-and-Exporting-Data)
82
+ - [API Reference](https://github.com/cscully-allison/guidepost/wiki/API-Reference)
83
+ - [FAQ and Troubleshooting](https://github.com/cscully-allison/guidepost/wiki/FAQ-and-Troubleshooting)
84
+
85
+ ## Contributing
86
+
87
+ Contributions are welcome. Fork the repository, create a branch for your feature or bugfix, and open a pull request with a description of your changes.
88
+
89
+ ## License
90
+
91
+ Guidepost is licensed under the MIT License. See the `LICENSE` file for details.
92
+
93
+ ## Acknowledgments
94
+
95
+ Guidepost was developed under the auspices and with funding provided by the National Renewable Energy Laboratory (NREL), the National Science Foundation under NSF IIS-1844573 and IIS-2324465, and the Department of Energy under DE-SC0022044 and DE-SC0024635.
96
+
97
+ ## Contact
98
+
99
+ For questions or feedback, reach out to the maintainer at [cscullyallison@sci.utah.edu].
@@ -0,0 +1,68 @@
1
+ # Guidepost
2
+
3
+ Guidepost is a Python library for visualizing High Performance Computing (HPC) job data in Jupyter notebooks. It turns a `pandas` DataFrame of job records into a single, linked, interactive overview — faceted heatmaps framed by histograms, a categorical bar chart, and a brushable color legend — so you can spot patterns in runtimes, queue waits, and resource usage, then export the exact records you care about back into Python.
4
+
5
+ ![Annotated Guidepost visualization showing the data grouping name, color-by-categorical bar chart, and the current selection of records for export](https://i.postimg.cc/vTDMX2b3/temp-Image-MVb5ui.avif)
6
+
7
+ ## Installation
8
+
9
+ ```bash
10
+ pip install guidepost
11
+ ```
12
+
13
+ ## Quick start
14
+
15
+ ```python
16
+ from guidepost import Guidepost
17
+ import pandas as pd
18
+
19
+ gp = Guidepost()
20
+ gp.load_data(pd.read_parquet("data/jobs_data.parquet"))
21
+
22
+ gp.vis_configs = {
23
+ 'x': 'start_time', # x-axis (numeric or datetime)
24
+ 'y': 'queue_wait', # y-axis (numeric)
25
+ 'color': 'nodes_requested', # cell color (numeric)
26
+ 'color_agg': 'avg', # aggregation for color
27
+ 'categorical': 'user', # bar chart / filter
28
+ 'facet_by': 'partition' # splits the data into groups
29
+ }
30
+
31
+ gp # display in a notebook cell
32
+ ```
33
+
34
+ Brush the heatmap or its histograms, then pull the selected rows back into Python:
35
+
36
+ ```python
37
+ df = gp.retrieve_selected_data() # or: gp.selection.dataframe
38
+ ```
39
+
40
+ Input is a `pandas` DataFrame with at least three numeric and two categorical columns (datetime columns are supported on the x-axis).
41
+
42
+ ## Documentation
43
+
44
+ Full documentation lives in the **[Guidepost Wiki](https://github.com/cscully-allison/guidepost/wiki)**:
45
+
46
+ - [Getting Started](https://github.com/cscully-allison/guidepost/wiki/Getting-Started)
47
+ - [Data Requirements and Type Detection](https://github.com/cscully-allison/guidepost/wiki/Data-Requirements-and-Type-Detection)
48
+ - [Configuration](https://github.com/cscully-allison/guidepost/wiki/Configuration)
49
+ - [Understanding the Views](https://github.com/cscully-allison/guidepost/wiki/Understanding-the-Views) — and the per-view interaction guides
50
+ - [Selecting and Exporting Data](https://github.com/cscully-allison/guidepost/wiki/Selecting-and-Exporting-Data)
51
+ - [API Reference](https://github.com/cscully-allison/guidepost/wiki/API-Reference)
52
+ - [FAQ and Troubleshooting](https://github.com/cscully-allison/guidepost/wiki/FAQ-and-Troubleshooting)
53
+
54
+ ## Contributing
55
+
56
+ Contributions are welcome. Fork the repository, create a branch for your feature or bugfix, and open a pull request with a description of your changes.
57
+
58
+ ## License
59
+
60
+ Guidepost is licensed under the MIT License. See the `LICENSE` file for details.
61
+
62
+ ## Acknowledgments
63
+
64
+ Guidepost was developed under the auspices and with funding provided by the National Renewable Energy Laboratory (NREL), the National Science Foundation under NSF IIS-1844573 and IIS-2324465, and the Department of Energy under DE-SC0022044 and DE-SC0024635.
65
+
66
+ ## Contact
67
+
68
+ For questions or feedback, reach out to the maintainer at [cscullyallison@sci.utah.edu].
@@ -21,6 +21,22 @@ from .aggregation import AggregationEngine
21
21
  SYNTHETIC_FACET_COL = "__gp_no_grouping__"
22
22
  SYNTHETIC_FACET_VALUE = "All records"
23
23
 
24
+
25
+ class Selection:
26
+ """Wrapper around the records selected in the widget.
27
+
28
+ The selected DataFrame is exposed as `.dataframe`. This indirection
29
+ leaves room to attach further selection metadata in the future without
30
+ changing the `gp.selection` access pattern.
31
+ """
32
+
33
+ def __init__(self, dataframe):
34
+ self.dataframe = dataframe
35
+
36
+ def __repr__(self):
37
+ return f"Selection(dataframe={self.dataframe!r})"
38
+
39
+
24
40
  class Guidepost(anywidget.AnyWidget):
25
41
 
26
42
  _esm = os.path.join(os.path.dirname(__file__), "static", "guidepost.js")
@@ -37,7 +53,6 @@ class Guidepost(anywidget.AnyWidget):
37
53
 
38
54
  selected_records = traitlets.Unicode("[]").tag(sync=True)
39
55
  records_df = pd.DataFrame()
40
- selection = None
41
56
 
42
57
  _summary_stats = traitlets.Dict({}).tag(sync=True)
43
58
 
@@ -272,7 +287,10 @@ class Guidepost(anywidget.AnyWidget):
272
287
 
273
288
  @property
274
289
  def selection(self):
275
- return self.retrieve_selected_data()
290
+ # `selection` is intentionally an object wrapper rather than the bare
291
+ # DataFrame so additional selection metadata can be hung off it later
292
+ # without breaking callers. The DataFrame lives on `.dataframe`.
293
+ return Selection(self.retrieve_selected_data())
276
294
 
277
295
  def retrieve_selected_data(self):
278
296
  if self.cached_records_df is None:
@@ -0,0 +1,2 @@
1
+ __version_info__ = ("0", "3", "1")
2
+ __version__ = ".".join(__version_info__)
@@ -0,0 +1,99 @@
1
+ Metadata-Version: 2.4
2
+ Name: guidepost
3
+ Version: 0.3.1
4
+ Summary: Guidepost. An overview visualization for understanding supercomputer queue data.
5
+ Home-page: https://github.com/cscully-allison/guidepost
6
+ Author: Connor Scully-Allison
7
+ Author-email: cscullyallison@sci.utah.edu
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.10
12
+ Description-Content-Type: text/markdown
13
+ License-File: LICENSE
14
+ Requires-Dist: numpy
15
+ Requires-Dist: pandas
16
+ Requires-Dist: scikit-learn
17
+ Requires-Dist: anywidget
18
+ Requires-Dist: traitlets
19
+ Requires-Dist: pyarrow>=14
20
+ Requires-Dist: duckdb>=0.10
21
+ Dynamic: author
22
+ Dynamic: author-email
23
+ Dynamic: classifier
24
+ Dynamic: description
25
+ Dynamic: description-content-type
26
+ Dynamic: home-page
27
+ Dynamic: license-file
28
+ Dynamic: requires-dist
29
+ Dynamic: requires-python
30
+ Dynamic: summary
31
+
32
+ # Guidepost
33
+
34
+ Guidepost is a Python library for visualizing High Performance Computing (HPC) job data in Jupyter notebooks. It turns a `pandas` DataFrame of job records into a single, linked, interactive overview — faceted heatmaps framed by histograms, a categorical bar chart, and a brushable color legend — so you can spot patterns in runtimes, queue waits, and resource usage, then export the exact records you care about back into Python.
35
+
36
+ ![Annotated Guidepost visualization showing the data grouping name, color-by-categorical bar chart, and the current selection of records for export](https://i.postimg.cc/vTDMX2b3/temp-Image-MVb5ui.avif)
37
+
38
+ ## Installation
39
+
40
+ ```bash
41
+ pip install guidepost
42
+ ```
43
+
44
+ ## Quick start
45
+
46
+ ```python
47
+ from guidepost import Guidepost
48
+ import pandas as pd
49
+
50
+ gp = Guidepost()
51
+ gp.load_data(pd.read_parquet("data/jobs_data.parquet"))
52
+
53
+ gp.vis_configs = {
54
+ 'x': 'start_time', # x-axis (numeric or datetime)
55
+ 'y': 'queue_wait', # y-axis (numeric)
56
+ 'color': 'nodes_requested', # cell color (numeric)
57
+ 'color_agg': 'avg', # aggregation for color
58
+ 'categorical': 'user', # bar chart / filter
59
+ 'facet_by': 'partition' # splits the data into groups
60
+ }
61
+
62
+ gp # display in a notebook cell
63
+ ```
64
+
65
+ Brush the heatmap or its histograms, then pull the selected rows back into Python:
66
+
67
+ ```python
68
+ df = gp.retrieve_selected_data() # or: gp.selection.dataframe
69
+ ```
70
+
71
+ Input is a `pandas` DataFrame with at least three numeric and two categorical columns (datetime columns are supported on the x-axis).
72
+
73
+ ## Documentation
74
+
75
+ Full documentation lives in the **[Guidepost Wiki](https://github.com/cscully-allison/guidepost/wiki)**:
76
+
77
+ - [Getting Started](https://github.com/cscully-allison/guidepost/wiki/Getting-Started)
78
+ - [Data Requirements and Type Detection](https://github.com/cscully-allison/guidepost/wiki/Data-Requirements-and-Type-Detection)
79
+ - [Configuration](https://github.com/cscully-allison/guidepost/wiki/Configuration)
80
+ - [Understanding the Views](https://github.com/cscully-allison/guidepost/wiki/Understanding-the-Views) — and the per-view interaction guides
81
+ - [Selecting and Exporting Data](https://github.com/cscully-allison/guidepost/wiki/Selecting-and-Exporting-Data)
82
+ - [API Reference](https://github.com/cscully-allison/guidepost/wiki/API-Reference)
83
+ - [FAQ and Troubleshooting](https://github.com/cscully-allison/guidepost/wiki/FAQ-and-Troubleshooting)
84
+
85
+ ## Contributing
86
+
87
+ Contributions are welcome. Fork the repository, create a branch for your feature or bugfix, and open a pull request with a description of your changes.
88
+
89
+ ## License
90
+
91
+ Guidepost is licensed under the MIT License. See the `LICENSE` file for details.
92
+
93
+ ## Acknowledgments
94
+
95
+ Guidepost was developed under the auspices and with funding provided by the National Renewable Energy Laboratory (NREL), the National Science Foundation under NSF IIS-1844573 and IIS-2324465, and the Department of Energy under DE-SC0022044 and DE-SC0024635.
96
+
97
+ ## Contact
98
+
99
+ For questions or feedback, reach out to the maintainer at [cscullyallison@sci.utah.edu].
@@ -1,4 +1,5 @@
1
1
  LICENSE
2
+ README.md
2
3
  pyproject.toml
3
4
  setup.py
4
5
  guidepost/__init__.py
guidepost-0.3.0/PKG-INFO DELETED
@@ -1,225 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: guidepost
3
- Version: 0.3.0
4
- Summary: Guidepost. An overview visualization for understanding supercomputer queue data.
5
- Home-page: https://github.com/cscully-allison/guidepost
6
- Author: Connor Scully-Allison
7
- Author-email: cscullyallison@sci.utah.edu
8
- Classifier: Programming Language :: Python :: 3
9
- Classifier: License :: OSI Approved :: MIT License
10
- Classifier: Operating System :: OS Independent
11
- Requires-Python: >=3.10
12
- Description-Content-Type: text/markdown
13
- License-File: LICENSE
14
- Requires-Dist: numpy
15
- Requires-Dist: pandas
16
- Requires-Dist: scikit-learn
17
- Requires-Dist: anywidget
18
- Requires-Dist: traitlets
19
- Requires-Dist: pyarrow>=14
20
- Requires-Dist: duckdb>=0.10
21
- Dynamic: author
22
- Dynamic: author-email
23
- Dynamic: classifier
24
- Dynamic: description
25
- Dynamic: description-content-type
26
- Dynamic: home-page
27
- Dynamic: license-file
28
- Dynamic: requires-dist
29
- Dynamic: requires-python
30
- Dynamic: summary
31
-
32
- # Guidepost
33
-
34
- Guidepost is a Python library designed to visualize High Performance Computing (HPC) job data in jupyter notebooks. It simplifies the process of understanding HPC workloads by providing a single, interactive visualization that offers an intuitive overview of job performance, resource usage, and other critical metrics.
35
-
36
- ---
37
-
38
- ## Features
39
-
40
- - **Jupyter Notebook Integration**: Designed for your existing workflow. Load and interact with the visualization directly in your Jupyter environment.
41
- - **HPC Job Data Insights**: Visualize key metrics, including job runtimes, resource usage, and queue performance.
42
- - **Interactive Exploration**: Export selections of specific jobs or groups of jobs for deeper analysis.
43
-
44
- ---
45
-
46
- ## Installation
47
-
48
- Guidepost is available on PyPI. You can install it using pip:
49
-
50
- ```bash
51
- pip install guidepost
52
- ```
53
-
54
- ---
55
-
56
- ## Quick Start
57
-
58
- ### 1. Import and Initialize Guidepost
59
-
60
- ```python
61
- from guidepost import Guidepost
62
- gp = Guidepost()
63
- ```
64
-
65
- ### 2. Load Your Data
66
-
67
-
68
- ```python
69
- import pandas as pd
70
- jobs_data = pd.read_parquet("data/jobs_data.parquet")
71
- gp.load_data(jobs_data)
72
- ```
73
-
74
- Guidepost supports input data in a pandas DataFrame format.
75
-
76
- At least three numerical and 2 categorical columns are required. Datetime columns are also supported for encoding on the x axis.
77
-
78
- Here is a sample table containg jobs-related data from a supercomputer scheduling system:
79
-
80
- | job_id |start_time | queue_wait | nodes_requested | partition | status | user |
81
- |--------|-----------------------------|----------------|-----------------|-----------|------------|--------|
82
- | 12345 | 2023-11-01 21:19:33 |5.2 | 10 | short | Complete | User1 |
83
- | 12346 | 2023-11-01 21:20:01 |12.0 | 20 | long | Running | User2 |
84
-
85
- In this example, the three data values we will use for our x, y and color variables are: start_time, queue_wait and nodes_requested. We would also like to use `parition` to facet our data and `user` as an additional categorical variable to filter on. In the [next section](#3-configure-visualization), we show how to specify which columns in your dataset correspond to parts of the visualization.
86
-
87
- The `load_data()` function will format your data for json serialization and will update the visualization if it has already been run. This function will report out any columns or rows which are dropped from the original dataset due to conainting `null`/`NaN`/`None` values or unallowed datatypes like `timedelta`s.
88
-
89
- ### 3. Configure Visualization
90
- ```python
91
- gp.vis_configs = {
92
- 'x': 'start_time',
93
- 'y': 'queue_wait',
94
- 'color': 'nodes_requested',
95
- 'color_agg': 'avg',
96
- 'categorical': 'user',
97
- 'facet_by': 'partition'
98
- }
99
- ```
100
- #### Configuration Descriptions:
101
-
102
- - `x`: Name of the column in the dataframe which will be shown on the x axis of Guidepost's subcharts.
103
- - `y`: Name of the column in the dataframe which will be shown on the y axis of Guidepost's subcharts.
104
- - `color`: Name of the column in the dataframe which will be shown by the darkness of each square's color.
105
- - `color_agg`: The aggregation method used to determine the color. Can be: 'avg', 'variance', 'std', 'sum', or 'median'
106
- - `categorical`: Name of the column containing categorical data values which will be shown on a bar chart associated with each group of the data.
107
- - `facet_by`: Name of the column containing categorical data values which dictate the highest level grouping of the data and organizes the data into groups of subcharts.
108
-
109
- See the [Vis Configs Section](#vis_configs) for more details on datatype restrictions for each configuration.
110
-
111
- ### 4. Run Visualization
112
- ```python
113
- gp
114
- ```
115
-
116
- Run the above command in a Jupyter notebook cell to start the visualization.
117
-
118
- Here is an example of what the viusalization will look like:
119
-
120
- ![Image of the guidepost visualization. Annotations label various parts of the visualization: 'Data Grouping Name', 'Color by Categorical Variable', 'Bar Chart (Filter on Click)', 'Current Selection of Records for Export'](https://i.postimg.cc/vTDMX2b3/temp-Image-MVb5ui.avif)
121
-
122
-
123
- Here we explain some elements of the visualization:
124
-
125
- #### `Data Grouping Name`:
126
- This is name of the high level groups which are dictated by the `facet_by` configuration. Each group of subcharts corresponds to all data associated with an instance of a value in `facet_by`. If your data only logically contains one group, adding a synthetic column and specifying that column name for the `facet_by` cofiguration is advised.
127
-
128
- #### `Main Summary View`:
129
- The main summary view is the primary view associated with each group of data specified by `facet_by` configuration. This view shows the data organized by the x and y axes. Data values at similar locations along the x and y axes are grouped into squares at that location. The amount of data in each row and column are shown with the histograms framing this view. The color of each square shows an aggregrate of a third numerical variable that exists on each data value.
130
-
131
- #### `Color by Numerical Variable`:
132
- Each square in the main summary view is an aggregrate of datapoints at that x and y location. The color of a given square is dictated by the variable shown here. For example, in the bottom subchart, we see that there is a correlation between higher queue_waits, queue_wait_predictions and processor counts. The darker squares indicate higher processor counts on average.
133
-
134
-
135
- #### `Bar Chart (Filter on Click)`:
136
- The bar chart in the lower right hand corner of each row of subcharts shows the top ten instances of the column passed to the `categorical` configuration. It will filter the dataset when a bar is hovered over. Clicking a bar will fix that filter in place. Clicking again will remove the filter when the mouse leaves the bar.
137
-
138
-
139
- #### `Current Seleciton of Records for Export`:
140
- Records can be selected for export from the visualization by brushing over the right and bottom histograms. The area of selected data is indicated by the orange coloring on the main summary view. The amount of records selected is indicated at the top left for each chart. Selections can be made across multiple charts. The final selection is returned as one dataframe containg all selections.
141
-
142
-
143
- ### 5. Retrieve Selections from Visualization
144
-
145
- ```python
146
- df = gp.retrieve_selected_data()
147
- ```
148
-
149
- After selecting data by brushing over either the bottom or right histograms associated with a subchart, you can retrieve selected data using the above method.
150
-
151
- This will return a pandas DataFrame containing all your subselected rows from the original dataset.
152
-
153
-
154
-
155
-
156
- ---
157
-
158
- ## Example Dataset
159
- Below is an example of the kind of data Guidepost works with:
160
-
161
- | job_id |start_time | queue_wait | nodes_requested | partition | status | user |
162
- |--------|-----------------------------|----------------|-----------------|-----------|------------|--------|
163
- | 12345 | 2023-11-01 21:19:33 |5.2 | 10 | short | Complete | User1 |
164
- | 12346 | 2023-11-01 21:20:01 |12.0 | 20 | long | Running | User2 |
165
-
166
- ---
167
-
168
- ## API Reference
169
-
170
- ### `vis_data`
171
- - **Description**: Holds the vis data to passed to the visualization. Updates to this variable will automatically update the visualization.
172
-
173
-
174
- ### `vis_configs`
175
- - **Description**: Holds the vis configurations to passed to the visualization. Updates to this variable will automatically update the visualization.
176
-
177
- Vis configurations must be specified as a python dictonary with the following fields:
178
- - 'x': The column from the pandas dataframe which will be shown on the x axis. This can be a integer, float or datetime variable.
179
- - 'y': The column from the pandas dataframe which will be shown on the y axis of this visualization. This can be an integer or float.
180
- - 'color': The column from the pandas dataframe which will determine the color of squares in the main summary view. This can be an integer or float.
181
- - 'color_agg': This is a specification for what aggregation is used for the color variable. It can be: 'avg', 'variance', 'std', 'sum', or 'median'
182
- - 'categorical': A categorical variable from the dataset. The data column must be a string datatype. The visualization will show the top 10 instances of this variable.
183
- - 'facet_by': A categorical variable from the dataset. Automatically looks for 'queue' or 'partition' if this config is not specified.
184
-
185
-
186
- ### `load_data(in_df, supress_warnings)`
187
- - **Description**: Loads a pandas dataframe into the guidepost system for visualizaiton. Will report data dropped from the dataframe if it contains NaNs, `timedeltas`, `arrays` in cells, or other invalid values.
188
- - **Arguments**:
189
- - `in_df` (Pandas Dataframe): The dataframe containing data to be visualized.
190
- - `supress_warnings` (Boolean): Specifies whether to suppress warnings when loading data. Defaults to `False`
191
-
192
-
193
- ### `retrieve_selected_data()`
194
- - **Description**: Returns selected data back from the visualization.
195
- - **Returns**:
196
- - `subselection` (DataFrame or str): A Pandas DataFrame that contains subselected data specified from selections made to the visualization.
197
-
198
- ---
199
-
200
- ## Contributing
201
-
202
- Contributions to Guidepost are welcome! To contribute:
203
-
204
- 1. Fork the repository.
205
- 2. Create a new branch for your feature or bugfix.
206
- 3. Submit a pull request with a detailed description of your changes.
207
-
208
- ---
209
-
210
- ## License
211
-
212
- Guidepost is licensed under the MIT License. See the `LICENSE` file for details.
213
-
214
- ---
215
-
216
- ## Acknowledgments
217
-
218
- Guidepost was developed under the auspices and with funding provided by the National Renewable Energy Laboratory (NREL), the National Science Foundation under NSF IIS-1844573 and IIS-2324465, and the Department of Energy under DE-SC0022044 and DE-SC0024635.
219
-
220
- ---
221
-
222
- ## Contact
223
-
224
- For questions or feedback, please reach out to the maintainer at [cscullyallison@sci.utah.edu].
225
-
@@ -1,2 +0,0 @@
1
- __version_info__ = ("0", "3", "0")
2
- __version__ = ".".join(__version_info__)
@@ -1,225 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: guidepost
3
- Version: 0.3.0
4
- Summary: Guidepost. An overview visualization for understanding supercomputer queue data.
5
- Home-page: https://github.com/cscully-allison/guidepost
6
- Author: Connor Scully-Allison
7
- Author-email: cscullyallison@sci.utah.edu
8
- Classifier: Programming Language :: Python :: 3
9
- Classifier: License :: OSI Approved :: MIT License
10
- Classifier: Operating System :: OS Independent
11
- Requires-Python: >=3.10
12
- Description-Content-Type: text/markdown
13
- License-File: LICENSE
14
- Requires-Dist: numpy
15
- Requires-Dist: pandas
16
- Requires-Dist: scikit-learn
17
- Requires-Dist: anywidget
18
- Requires-Dist: traitlets
19
- Requires-Dist: pyarrow>=14
20
- Requires-Dist: duckdb>=0.10
21
- Dynamic: author
22
- Dynamic: author-email
23
- Dynamic: classifier
24
- Dynamic: description
25
- Dynamic: description-content-type
26
- Dynamic: home-page
27
- Dynamic: license-file
28
- Dynamic: requires-dist
29
- Dynamic: requires-python
30
- Dynamic: summary
31
-
32
- # Guidepost
33
-
34
- Guidepost is a Python library designed to visualize High Performance Computing (HPC) job data in jupyter notebooks. It simplifies the process of understanding HPC workloads by providing a single, interactive visualization that offers an intuitive overview of job performance, resource usage, and other critical metrics.
35
-
36
- ---
37
-
38
- ## Features
39
-
40
- - **Jupyter Notebook Integration**: Designed for your existing workflow. Load and interact with the visualization directly in your Jupyter environment.
41
- - **HPC Job Data Insights**: Visualize key metrics, including job runtimes, resource usage, and queue performance.
42
- - **Interactive Exploration**: Export selections of specific jobs or groups of jobs for deeper analysis.
43
-
44
- ---
45
-
46
- ## Installation
47
-
48
- Guidepost is available on PyPI. You can install it using pip:
49
-
50
- ```bash
51
- pip install guidepost
52
- ```
53
-
54
- ---
55
-
56
- ## Quick Start
57
-
58
- ### 1. Import and Initialize Guidepost
59
-
60
- ```python
61
- from guidepost import Guidepost
62
- gp = Guidepost()
63
- ```
64
-
65
- ### 2. Load Your Data
66
-
67
-
68
- ```python
69
- import pandas as pd
70
- jobs_data = pd.read_parquet("data/jobs_data.parquet")
71
- gp.load_data(jobs_data)
72
- ```
73
-
74
- Guidepost supports input data in a pandas DataFrame format.
75
-
76
- At least three numerical and 2 categorical columns are required. Datetime columns are also supported for encoding on the x axis.
77
-
78
- Here is a sample table containg jobs-related data from a supercomputer scheduling system:
79
-
80
- | job_id |start_time | queue_wait | nodes_requested | partition | status | user |
81
- |--------|-----------------------------|----------------|-----------------|-----------|------------|--------|
82
- | 12345 | 2023-11-01 21:19:33 |5.2 | 10 | short | Complete | User1 |
83
- | 12346 | 2023-11-01 21:20:01 |12.0 | 20 | long | Running | User2 |
84
-
85
- In this example, the three data values we will use for our x, y and color variables are: start_time, queue_wait and nodes_requested. We would also like to use `parition` to facet our data and `user` as an additional categorical variable to filter on. In the [next section](#3-configure-visualization), we show how to specify which columns in your dataset correspond to parts of the visualization.
86
-
87
- The `load_data()` function will format your data for json serialization and will update the visualization if it has already been run. This function will report out any columns or rows which are dropped from the original dataset due to conainting `null`/`NaN`/`None` values or unallowed datatypes like `timedelta`s.
88
-
89
- ### 3. Configure Visualization
90
- ```python
91
- gp.vis_configs = {
92
- 'x': 'start_time',
93
- 'y': 'queue_wait',
94
- 'color': 'nodes_requested',
95
- 'color_agg': 'avg',
96
- 'categorical': 'user',
97
- 'facet_by': 'partition'
98
- }
99
- ```
100
- #### Configuration Descriptions:
101
-
102
- - `x`: Name of the column in the dataframe which will be shown on the x axis of Guidepost's subcharts.
103
- - `y`: Name of the column in the dataframe which will be shown on the y axis of Guidepost's subcharts.
104
- - `color`: Name of the column in the dataframe which will be shown by the darkness of each square's color.
105
- - `color_agg`: The aggregation method used to determine the color. Can be: 'avg', 'variance', 'std', 'sum', or 'median'
106
- - `categorical`: Name of the column containing categorical data values which will be shown on a bar chart associated with each group of the data.
107
- - `facet_by`: Name of the column containing categorical data values which dictate the highest level grouping of the data and organizes the data into groups of subcharts.
108
-
109
- See the [Vis Configs Section](#vis_configs) for more details on datatype restrictions for each configuration.
110
-
111
- ### 4. Run Visualization
112
- ```python
113
- gp
114
- ```
115
-
116
- Run the above command in a Jupyter notebook cell to start the visualization.
117
-
118
- Here is an example of what the viusalization will look like:
119
-
120
- ![Image of the guidepost visualization. Annotations label various parts of the visualization: 'Data Grouping Name', 'Color by Categorical Variable', 'Bar Chart (Filter on Click)', 'Current Selection of Records for Export'](https://i.postimg.cc/vTDMX2b3/temp-Image-MVb5ui.avif)
121
-
122
-
123
- Here we explain some elements of the visualization:
124
-
125
- #### `Data Grouping Name`:
126
- This is name of the high level groups which are dictated by the `facet_by` configuration. Each group of subcharts corresponds to all data associated with an instance of a value in `facet_by`. If your data only logically contains one group, adding a synthetic column and specifying that column name for the `facet_by` cofiguration is advised.
127
-
128
- #### `Main Summary View`:
129
- The main summary view is the primary view associated with each group of data specified by `facet_by` configuration. This view shows the data organized by the x and y axes. Data values at similar locations along the x and y axes are grouped into squares at that location. The amount of data in each row and column are shown with the histograms framing this view. The color of each square shows an aggregrate of a third numerical variable that exists on each data value.
130
-
131
- #### `Color by Numerical Variable`:
132
- Each square in the main summary view is an aggregrate of datapoints at that x and y location. The color of a given square is dictated by the variable shown here. For example, in the bottom subchart, we see that there is a correlation between higher queue_waits, queue_wait_predictions and processor counts. The darker squares indicate higher processor counts on average.
133
-
134
-
135
- #### `Bar Chart (Filter on Click)`:
136
- The bar chart in the lower right hand corner of each row of subcharts shows the top ten instances of the column passed to the `categorical` configuration. It will filter the dataset when a bar is hovered over. Clicking a bar will fix that filter in place. Clicking again will remove the filter when the mouse leaves the bar.
137
-
138
-
139
- #### `Current Seleciton of Records for Export`:
140
- Records can be selected for export from the visualization by brushing over the right and bottom histograms. The area of selected data is indicated by the orange coloring on the main summary view. The amount of records selected is indicated at the top left for each chart. Selections can be made across multiple charts. The final selection is returned as one dataframe containg all selections.
141
-
142
-
143
- ### 5. Retrieve Selections from Visualization
144
-
145
- ```python
146
- df = gp.retrieve_selected_data()
147
- ```
148
-
149
- After selecting data by brushing over either the bottom or right histograms associated with a subchart, you can retrieve selected data using the above method.
150
-
151
- This will return a pandas DataFrame containing all your subselected rows from the original dataset.
152
-
153
-
154
-
155
-
156
- ---
157
-
158
- ## Example Dataset
159
- Below is an example of the kind of data Guidepost works with:
160
-
161
- | job_id |start_time | queue_wait | nodes_requested | partition | status | user |
162
- |--------|-----------------------------|----------------|-----------------|-----------|------------|--------|
163
- | 12345 | 2023-11-01 21:19:33 |5.2 | 10 | short | Complete | User1 |
164
- | 12346 | 2023-11-01 21:20:01 |12.0 | 20 | long | Running | User2 |
165
-
166
- ---
167
-
168
- ## API Reference
169
-
170
- ### `vis_data`
171
- - **Description**: Holds the vis data to passed to the visualization. Updates to this variable will automatically update the visualization.
172
-
173
-
174
- ### `vis_configs`
175
- - **Description**: Holds the vis configurations to passed to the visualization. Updates to this variable will automatically update the visualization.
176
-
177
- Vis configurations must be specified as a python dictonary with the following fields:
178
- - 'x': The column from the pandas dataframe which will be shown on the x axis. This can be a integer, float or datetime variable.
179
- - 'y': The column from the pandas dataframe which will be shown on the y axis of this visualization. This can be an integer or float.
180
- - 'color': The column from the pandas dataframe which will determine the color of squares in the main summary view. This can be an integer or float.
181
- - 'color_agg': This is a specification for what aggregation is used for the color variable. It can be: 'avg', 'variance', 'std', 'sum', or 'median'
182
- - 'categorical': A categorical variable from the dataset. The data column must be a string datatype. The visualization will show the top 10 instances of this variable.
183
- - 'facet_by': A categorical variable from the dataset. Automatically looks for 'queue' or 'partition' if this config is not specified.
184
-
185
-
186
- ### `load_data(in_df, supress_warnings)`
187
- - **Description**: Loads a pandas dataframe into the guidepost system for visualizaiton. Will report data dropped from the dataframe if it contains NaNs, `timedeltas`, `arrays` in cells, or other invalid values.
188
- - **Arguments**:
189
- - `in_df` (Pandas Dataframe): The dataframe containing data to be visualized.
190
- - `supress_warnings` (Boolean): Specifies whether to suppress warnings when loading data. Defaults to `False`
191
-
192
-
193
- ### `retrieve_selected_data()`
194
- - **Description**: Returns selected data back from the visualization.
195
- - **Returns**:
196
- - `subselection` (DataFrame or str): A Pandas DataFrame that contains subselected data specified from selections made to the visualization.
197
-
198
- ---
199
-
200
- ## Contributing
201
-
202
- Contributions to Guidepost are welcome! To contribute:
203
-
204
- 1. Fork the repository.
205
- 2. Create a new branch for your feature or bugfix.
206
- 3. Submit a pull request with a detailed description of your changes.
207
-
208
- ---
209
-
210
- ## License
211
-
212
- Guidepost is licensed under the MIT License. See the `LICENSE` file for details.
213
-
214
- ---
215
-
216
- ## Acknowledgments
217
-
218
- Guidepost was developed under the auspices and with funding provided by the National Renewable Energy Laboratory (NREL), the National Science Foundation under NSF IIS-1844573 and IIS-2324465, and the Department of Energy under DE-SC0022044 and DE-SC0024635.
219
-
220
- ---
221
-
222
- ## Contact
223
-
224
- For questions or feedback, please reach out to the maintainer at [cscullyallison@sci.utah.edu].
225
-
File without changes
File without changes
File without changes
File without changes
File without changes