PyPI - better-git-of-theseus - Versions diffs - 0.4.2__py3-none-any.whl → 0.5.0__py3-none-any.whl - Mend

better-git-of-theseus 0.4.2py3-none-any.whl → 0.5.0py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

{better_git_of_theseus-0.4.2.dist-info → better_git_of_theseus-0.5.0.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: better-git-of-theseus
-Version: 0.4.2
+Version: 0.5.0
 Summary: Plot stats on Git repositories with interactive Plotly charts
 Home-page: https://github.com/onewesong/better-git-of-theseus
 Author: Erik Bernhardsson
@@ -16,6 +16,7 @@ Requires-Dist: plotly
 Requires-Dist: streamlit
 Requires-Dist: python-dateutil
 Requires-Dist: scipy
+Requires-Dist: matplotlib
 Dynamic: author
 Dynamic: author-email
 Dynamic: description
@@ -41,7 +42,8 @@ Dynamic: summary
 **Better Git of Theseus** is a modern refactor of the original [git-of-theseus](https://github.com/erikbern/git-of-theseus). It provides a fully interactive Web Dashboard powered by **Streamlit** and **Plotly**, making it easier than ever to visualize how your code evolves over time.
-![Git of Theseus Dashboard](https://raw.githubusercontent.com/erikbern/git-of-theseus/master/pics/git-git.png) *(Note: Charts are now fully interactive!)*
+![Git of Theseus Cohorts](pics/plot-cohorts.png)
+![Git of Theseus Authors](pics/plot-authors.png)
 ## Key Enhancements

{better_git_of_theseus-0.4.2.dist-info → better_git_of_theseus-0.5.0.dist-info}/RECORD RENAMED Viewed

@@ -1,15 +1,15 @@
-better_git_of_theseus-0.4.2.dist-info/licenses/LICENSE,sha256=yNNDAWUe1WLKnuUcRp9X95C-yP2lfGl69m97Ftw-DUw,11345
+better_git_of_theseus-0.5.0.dist-info/licenses/LICENSE,sha256=yNNDAWUe1WLKnuUcRp9X95C-yP2lfGl69m97Ftw-DUw,11345
 git_of_theseus/__init__.py,sha256=LeG5tCOgvZMmKOjmO_HRg54sWF2K3-lTBf8H_vHMFio,273
 git_of_theseus/analyze.py,sha256=78E1G2FdSS9VZd0jKSnO5gpXwzNCjtzkSAxSzadYM3A,21547
-git_of_theseus/app.py,sha256=8aS72pMg4CyEqdYhcsz1QZSxYw8wh_iQLvb-2LOK25A,5192
-git_of_theseus/cmd.py,sha256=4W8C0tb-9Uejq4WRjGyYKXmZgE4HQBk7KFIKrozY4Og,639
+git_of_theseus/app.py,sha256=6GuBWC3WaN0VkwHAmKM_6ZNalf7NcXgXYq3ZP6XMDvY,8463
+git_of_theseus/cmd.py,sha256=kvi3sgC0ICj7PvXuzCQsijU5swS3JkfAtIT5yZRNbFo,550
 git_of_theseus/line_plot.py,sha256=LegoVy0VEFT4sM5fYCES-I_2H9UaerCopDI3J2dyHeU,3117
-git_of_theseus/plotly_plots.py,sha256=c_9rJo3qlOy-TdHPsvuDH-6hVVO0_xYq-DmOnmgqOCE,7414
+git_of_theseus/plotly_plots.py,sha256=Ru-Xwj8c4XdDJfzjDUqU0CxddmLq76LTtX5r0VqSYvs,7267
 git_of_theseus/stack_plot.py,sha256=q4-YlW3PyiwbIBFeHBA3dsdR1I_XKUQD74hAuSfhIR4,3150
 git_of_theseus/survival_plot.py,sha256=NEITAa0pMD9uJVsPd7JA71ucavnG1RxgC-F6Jk-K5bE,4868
 git_of_theseus/utils.py,sha256=Xw2udch9ixSgFInGhIC4_RJ_9IB3E8MmV1dmznavCWc,1026
-better_git_of_theseus-0.4.2.dist-info/METADATA,sha256=ZqXuZz94OODNr50giDYHD5Iihw5D_38undkBRXIeHDI,3633
-better_git_of_theseus-0.4.2.dist-info/WHEEL,sha256=wUyA8OaulRlbfwMtmQsvNngGrxQHAvkKcvRmdizlJi0,92
-better_git_of_theseus-0.4.2.dist-info/entry_points.txt,sha256=dEBL6oCDAozY13Y_qxS_6-qkyCA7R2TpjoLH6QJR72g,66
-better_git_of_theseus-0.4.2.dist-info/top_level.txt,sha256=2kpp8WgiBzqVLxua_mBS00Nj4cUORaRbJi121THJ_0o,15
-better_git_of_theseus-0.4.2.dist-info/RECORD,,
+better_git_of_theseus-0.5.0.dist-info/METADATA,sha256=I4B84JL2IDJ-ICFiK3Su4uxkcqumW-h4PAWQdl0h1BI,3602
+better_git_of_theseus-0.5.0.dist-info/WHEEL,sha256=wUyA8OaulRlbfwMtmQsvNngGrxQHAvkKcvRmdizlJi0,92
+better_git_of_theseus-0.5.0.dist-info/entry_points.txt,sha256=dEBL6oCDAozY13Y_qxS_6-qkyCA7R2TpjoLH6QJR72g,66
+better_git_of_theseus-0.5.0.dist-info/top_level.txt,sha256=2kpp8WgiBzqVLxua_mBS00Nj4cUORaRbJi121THJ_0o,15
+better_git_of_theseus-0.5.0.dist-info/RECORD,,

git_of_theseus/app.py CHANGED Viewed

@@ -4,13 +4,26 @@ import tempfile
 import shutil
 try:
     from git_of_theseus.analyze import analyze
-    from git_of_theseus.plotly_plots import plotly_stack_plot, plotly_line_plot, plotly_survival_plot
+    from git_of_theseus.plotly_plots import plotly_stack_plot, plotly_line_plot, plotly_survival_plot, plotly_bar_plot
 except ImportError:
     from analyze import analyze
-    from plotly_plots import plotly_stack_plot, plotly_line_plot, plotly_survival_plot
+    from plotly_plots import plotly_stack_plot, plotly_line_plot, plotly_survival_plot, plotly_bar_plot
 st.set_page_config(page_title="Git of Theseus Dash", layout="wide")
+# GitHub Link in Sidebar
+st.sidebar.markdown(
+    """
+    <div style="display: flex; align-items: center; margin-bottom: 20px;">
+        <img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" width="30" style="margin-right: 10px;">
+        <a href="https://github.com/onewesong/better-git-of-theseus" target="_blank" style="text-decoration: none; color: inherit; font-weight: bold;">
+            better-git-of-theseus
+        </a>
+    </div>
+    """,
+    unsafe_allow_html=True
+)
 st.title("📊 Git of Theseus - Repository Analysis")
 import sys
@@ -18,12 +31,49 @@ import sys
 # Sidebar Configuration
 st.sidebar.header("Configuration")
+with st.sidebar.expander("📖 How to use", expanded=False):
+    st.markdown("""
+    **Better Git of Theseus** is a tool to analyze the evolution of Git repositories.
+    ### Plots Explained:
+    - **Stack Plot**: Shows code growth over time, broken down by cohort (when code was added).
+    - **Line Plot**: Shows trends across different dimensions (Author, Extension, etc.).
+    - **Distribution**: Shows the **current** distribution (Who contributed most, which file types are dominant).
+    - **Survival Plot**: Estimates how long a line of code typically lasts before being modified or deleted.
+    ### Tips:
+    - **Cohort Format**: `%Y` (Yearly) and `%Y-%m` (Monthly) are recommended.
+    - **Mailmap**: Use a `.mailmap` file in the repo root to resolve duplicate author names.
+    """)
 default_repo = "."
 if len(sys.argv) > 1:
     default_repo = sys.argv[1]
-repo_path = st.sidebar.text_input("Git Repository Path", value=default_repo)
-branch = st.sidebar.text_input("Branch", value="master")
+repo_path = default_repo
+# Path display removed as per user request
+# Fetch branches for the selectbox
+try:
+    import git
+    repo = git.Repo(repo_path)
+    # Get local branches
+    branches = [h.name for h in repo.heads]
+    # Try to determine the best default branch (active one, or master/main)
+    try:
+        current_active = repo.active_branch.name
+    except:
+        current_active = "master"
+    if current_active in branches:
+        branches.remove(current_active)
+    options = [current_active] + sorted(branches)
+    branch = st.sidebar.selectbox("Branch", options=options)
+except Exception as e:
+    # Fallback if git repo access fails
+    branch = st.sidebar.text_input("Branch", value="master")
 with st.sidebar.expander("Analysis Parameters"):
     cohortfm = st.text_input(
@@ -35,9 +85,23 @@ with st.sidebar.expander("Analysis Parameters"):
              "- `%Y-W%W`: Week (e.g., 2023-W01)\n"
              "- `%Y-%m-%d`: Day"
     )
-    interval = st.number_input("Interval (seconds)", value=7 * 24 * 60 * 60)
-    procs = st.number_input("Processes", value=2, min_value=1)
-    ignore = st.text_area("Ignore (comma separated)").split(",")
+    interval = st.number_input(
+        "Analysis Interval (seconds)",
+        value=7 * 24 * 60 * 60,
+        help="The time step between data points. Default is 604800s (7 days). Larger values are faster; smaller values result in smoother curves."
+    )
+    st.caption(f"Current resolution: {interval / 86400:.1f} days")
+    procs = st.number_input(
+        "Parallel Processes",
+        value=2,
+        min_value=1,
+        help="Number of concurrent processes. Increase to speed up analysis on multi-core CPUs, but note it increases RAM usage."
+    )
+    ignore = st.text_area(
+        "Ignore Patterns",
+        help="Glob patterns to ignore (comma separated), e.g.: 'tests/**, *.md'"
+    ).split(",")
     ignore = [i.strip() for i in ignore if i.strip()]
 @st.cache_data(show_spinner=False)
@@ -71,7 +135,7 @@ if st.sidebar.button("🚀 Run Analysis") or (len(sys.argv) > 1 and st.session_s
 # Main View
 if st.session_state.analysis_results:
     results = st.session_state.analysis_results
-    tab1, tab2, tab3 = st.tabs(["Stack Plot", "Line Plot", "Survival Plot"])
+    tab1, tab2, tab3, tab4 = st.tabs(["Stack Plot", "Line Plot", "Distribution", "Survival Plot"])
     with tab1:
         st.header("Stack Plot")
@@ -115,6 +179,22 @@ if st.session_state.analysis_results:
                 st.warning(f"Data for {data_source_label_line} not found.")
     with tab3:
+        st.header("Latest Distribution")
+        col1, col2 = st.columns([1, 3])
+        with col1:
+            data_source_label_bar = st.selectbox("Data Source", list(source_map.keys()), key="bar_source")
+            data_key_bar = source_map[data_source_label_bar]
+            max_n_bar = st.slider("Max Series", 5, 100, 30, key="bar_max_n")
+        with col2:
+            project_name = os.path.basename(os.path.abspath(repo_path))
+            data_bar = results.get(data_key_bar)
+            if data_bar:
+                fig = plotly_bar_plot(data_bar, max_n=max_n_bar, title=f"{project_name} - {data_source_label_bar}")
+                st.plotly_chart(fig, width="stretch")
+            else:
+                st.warning(f"Data for {data_source_label_bar} not found.")
+    with tab4:
         st.header("Survival Plot")
         col1, col2 = st.columns([1, 3])
         with col1:

git_of_theseus/cmd.py CHANGED Viewed

@@ -7,9 +7,8 @@ def main():
     cmd_dir = os.path.dirname(os.path.abspath(__file__))
     app_path = os.path.join(cmd_dir, "app.py")
-    # The first argument is the repo path, default to current directory
-    repo_path = sys.argv[1] if len(sys.argv) > 1 else os.getcwd()
-    repo_path = os.path.abspath(repo_path)
+    # Always use the current working directory
+    repo_path = os.path.abspath(os.getcwd())
     # Run streamlit
     # We pass the repo_path as an argument to the streamlit script

git_of_theseus/plotly_plots.py CHANGED Viewed

@@ -7,12 +7,17 @@ import math
 import os
 from .utils import generate_n_colors
+# Harmonious, professional color palette (Modern & Muted)
+# Inspired by Tableau 20 and modern UI systems
+PREMIUM_PALETTE = [
+    "#4E79A7", "#A0CBE8", "#F28E2B", "#FFBE7D", "#59A14F",
+    "#8CD17D", "#B6992D", "#F1CE63", "#499894", "#86BCB6",
+    "#E15759", "#FF9D9A", "#79706E", "#BAB0AC", "#D37295",
+    "#FABFD2", "#B07AA1", "#D4A1D2", "#9D7660", "#D7B5A6"
+]
 def _process_stack_line_data(data, max_n=20, normalize=False):
-    # Handle dict or file path
-    # If it's a file path, load it? But app.py passes dict now.
-    # Let's assume dict for now as per app.py refactor.
     if not isinstance(data, dict):
-         # Fallback if needed, though app.py sends dict
         import json
         data = json.load(open(data))
@@ -20,34 +25,16 @@ def _process_stack_line_data(data, max_n=20, normalize=False):
     labels = data["labels"]
     ts = [dateutil.parser.parse(t) for t in data["ts"]]
-    # Sort and filter top N
     if y.shape[0] > max_n:
-        # Sort by max value in the series
         js = sorted(range(len(labels)), key=lambda j: max(y[j]), reverse=True)
-        # Calculate other sum
         other_indices = js[max_n:]
         if other_indices:
             other_sum = np.sum([y[j] for j in other_indices], axis=0)
-            # Top N indices
             top_js = sorted(js[:max_n], key=lambda j: labels[j])
             y = np.array([y[j] for j in top_js] + [other_sum])
             labels = [labels[j] for j in top_js] + ["other"]
-        else:
-            # Should hopefully not happen if shape[0] > max_n
-             pass
-    else:
-        # Sort alphabetically for consistency
-        js = range(len(labels))
-        # strictly speaking existing code didn't sort if <= max_n?
-        # "labels = data['labels']" in existing code.
-        pass
-    y_sums = np.sum(y, axis=0)
-    # Avoid division by zero
+    y_sums = np.sum(y, axis=0)
     y_sums[y_sums == 0] = 1.0
     if normalize:
@@ -57,43 +44,39 @@ def _process_stack_line_data(data, max_n=20, normalize=False):
 def plotly_stack_plot(data, max_n=20, normalize=False, title=None):
     ts, y, labels = _process_stack_line_data(data, max_n, normalize)
     fig = go.Figure()
-    # Use a nice color palette
-    colors = px.colors.qualitative.Plotly
-    if len(labels) > len(colors):
-        colors = px.colors.qualitative.Dark24 # More colors if needed
     for i, label in enumerate(labels):
-        color = colors[i % len(colors)]
+        color = PREMIUM_PALETTE[i % len(PREMIUM_PALETTE)]
         fig.add_trace(go.Scatter(
             x=ts,
             y=y[i],
             mode='lines',
             name=label,
-            stackgroup='one', # This enables stacking
-            line=dict(width=0.5, color=color),
-            fillcolor=color # Optional: specific fill color
+            stackgroup='one',
+            line=dict(width=0.5, color='rgba(255,255,255,0.3)'),
+            fillcolor=color,
+            hoverinfo='x+y+name'
         ))
     fig.update_layout(
-        title=dict(text=title, x=0.5) if title else None,
+        title=dict(text=title, x=0.5, font=dict(size=20)) if title else None,
         yaxis=dict(
-            title="Share of lines of code (%)" if normalize else "Lines of code",
-            range=[0, 100] if normalize else None
+            title="Share of LoC (%)" if normalize else "Lines of Code",
+            range=[0, 100.1] if normalize else None,
+            gridcolor='rgba(128,128,128,0.2)'
         ),
-        xaxis=dict(title="Date"),
+        xaxis=dict(title="Date", gridcolor='rgba(128,128,128,0.2)'),
         hovermode="x unified",
-        margin=dict(l=20, r=20, t=50, b=20),
+        margin=dict(l=20, r=20, t=60, b=20),
+        legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1),
+        plot_bgcolor='rgba(0,0,0,0)',
+        paper_bgcolor='rgba(0,0,0,0)'
     )
     return fig
 def plotly_line_plot(data, max_n=20, normalize=False, title=None):
     ts, y, labels = _process_stack_line_data(data, max_n, normalize)
     fig = go.Figure()
     for i, label in enumerate(labels):
@@ -102,40 +85,30 @@ def plotly_line_plot(data, max_n=20, normalize=False, title=None):
             y=y[i],
             mode='lines',
             name=label,
-            line=dict(width=2)
+            line=dict(width=2.5, color=PREMIUM_PALETTE[i % len(PREMIUM_PALETTE)])
         ))
     fig.update_layout(
-        title=dict(text=title, x=0.5) if title else None,
+        title=dict(text=title, x=0.5, font=dict(size=20)) if title else None,
         yaxis=dict(
-            title="Share of lines of code (%)" if normalize else "Lines of code",
-            range=[0, 100] if normalize else None
+            title="Share of LoC (%)" if normalize else "Lines of Code",
+            range=[0, 100.1] if normalize else None,
+            gridcolor='rgba(128,128,128,0.2)'
         ),
-        xaxis=dict(title="Date"),
+        xaxis=dict(title="Date", gridcolor='rgba(128,128,128,0.2)'),
         hovermode="x unified",
-        margin=dict(l=20, r=20, t=50, b=20),
+        margin=dict(l=20, r=20, t=60, b=20),
+        legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1),
+        plot_bgcolor='rgba(0,0,0,0)',
+        paper_bgcolor='rgba(0,0,0,0)'
     )
     return fig
 def plotly_survival_plot(commit_history, exp_fit=False, years=5, title=None):
-    # Logic copied from survival_plot.py
-    # commit_history is {sha: [[ts, count], ...]}
     deltas = collections.defaultdict(lambda: np.zeros(2))
     total_n = 0
     YEAR = 365.25 * 24 * 60 * 60
-    # Process history
-    # Input might be a list of histories if we support multiple inputs,
-    # but based on app.py we pass a single result["survival"] dict.
-    # However, existing survival_plot took a LIST of filenames.
-    # Let's support the single dict passed from app.py.
-    # The logic in survival_plot.py iterates over input_fns, loads them, and computes `all_deltas`.
-    # Here we assume `commit_history` IS the content of one such file (the dict).
     for commit, history in commit_history.items():
         t0, orig_count = history[0]
         total_n += orig_count
@@ -145,99 +118,86 @@ def plotly_survival_plot(commit_history, exp_fit=False, years=5, title=None):
             last_count = count
         deltas[history[-1][0] - t0] += (-last_count, -orig_count)
-    # Calculate curve
     P = 1.0
-    xs = []
-    ys = []
-    # Sort deltas by time
+    xs, ys = [], []
     sorted_times = sorted(deltas.keys())
-    total_k = total_n # unused?
     for t in sorted_times:
         delta_k, delta_n = deltas[t]
         xs.append(t / YEAR)
         ys.append(100.0 * P)
         if total_n > 0:
              P *= 1 + delta_k / total_n
-        # total_k += delta_k
         total_n += delta_n
-        if P < 0.05:
-            break
+        if P < 0.05: break
     fig = go.Figure()
-    # Main survival curve
     fig.add_trace(go.Scatter(
         x=xs, y=ys,
         mode='lines',
         name='Survival Rate',
-        line=dict(color='blue')
+        line=dict(color=PREMIUM_PALETTE[0], width=3)
     ))
-    # Exponential fit
     if exp_fit:
         try:
             import scipy.optimize
-            # Define loss function for fit
             def fit(k):
-                loss = 0.0
-                # Re-calculate P stream to fit k
-                # Need to iterate again or reuse data?
-                # The original code re-iterates.
-                # Simplified for single dataset:
-                curr_total_n = 0
-                for _, history in commit_history.items():
-                    curr_total_n += history[0][1]
-                P_fit = 1.0
-                curr_total_n_fit = curr_total_n
+                loss, curr_total_n = 0.0, sum(h[0][1] for h in commit_history.values())
+                P_fit, curr_total_n_fit = 1.0, curr_total_n
                 for t in sorted_times:
                     delta_k, delta_n = deltas[t]
                     pred = curr_total_n_fit * math.exp(-k * t / YEAR)
                     loss += (curr_total_n_fit * P_fit - pred) ** 2
-                    if curr_total_n_fit > 0:
-                        P_fit *= 1 + delta_k / curr_total_n_fit
+                    if curr_total_n_fit > 0: P_fit *= 1 + delta_k / curr_total_n_fit
                     curr_total_n_fit += delta_n
                 return loss
             k_opt = scipy.optimize.fmin(fit, 0.5, maxiter=50, disp=False)[0]
             ts_fit = np.linspace(0, years, 100)
             ys_fit = [100.0 * math.exp(-k_opt * t) for t in ts_fit]
             half_life = math.log(2) / k_opt
             fig.add_trace(go.Scatter(
                 x=ts_fit, y=ys_fit,
                 mode='lines',
                 name=f"Exp. Fit (Half-life: {half_life:.2f} yrs)",
-                line=dict(color='red', dash='dash')
+                line=dict(color=PREMIUM_PALETTE[10], dash='dash', width=2)
             ))
-        except ImportError:
-            pass # Or warn user
+        except ImportError: pass
     fig.update_layout(
         title=dict(text=title, x=0.5) if title else None,
-        yaxis=dict(
-            title="lines still present (%)",
-            range=[0, 100]
-        ),
-        xaxis=dict(
-            title="Years",
-            range=[0, years]
-        ),
+        yaxis=dict(title="Lines still present (%)", range=[0, 105], gridcolor='rgba(128,128,128,0.2)'),
+        xaxis=dict(title="Years", range=[0, years], gridcolor='rgba(128,128,128,0.2)'),
         hovermode="x unified",
         margin=dict(l=20, r=20, t=50, b=20),
+        plot_bgcolor='rgba(0,0,0,0)',
+        paper_bgcolor='rgba(0,0,0,0)'
     )
+    return fig
+def plotly_bar_plot(data, max_n=20, title=None):
+    _, y, labels = _process_stack_line_data(data, max_n, normalize=False)
+    latest_values = [row[-1] for row in y]
+    indices = sorted(range(len(labels)), key=lambda i: latest_values[i], reverse=True)
+    sorted_labels = [labels[i] for i in indices]
+    sorted_values = [latest_values[i] for i in indices]
+    fig = go.Figure(go.Bar(
+        x=sorted_labels,
+        y=sorted_values,
+        marker=dict(
+            color=sorted_values,
+            colorscale=[[i/(len(PREMIUM_PALETTE)-1), c] for i, c in enumerate(PREMIUM_PALETTE)],
+            showscale=False
+        )
+    ))
+    fig.update_layout(
+        title=dict(text=f"{title} (Latest)" if title else "Latest Distribution", x=0.5),
+        yaxis=dict(title="Lines of Code", gridcolor='rgba(128,128,128,0.2)'),
+        xaxis=dict(title=""),
+        margin=dict(l=20, r=20, t=50, b=100),
+        plot_bgcolor='rgba(0,0,0,0)',
+        paper_bgcolor='rgba(0,0,0,0)'
+    )
     return fig

{better_git_of_theseus-0.4.2.dist-info → better_git_of_theseus-0.5.0.dist-info}/WHEEL RENAMED Viewed

File without changes

{better_git_of_theseus-0.4.2.dist-info → better_git_of_theseus-0.5.0.dist-info}/entry_points.txt RENAMED Viewed

File without changes

{better_git_of_theseus-0.4.2.dist-info → better_git_of_theseus-0.5.0.dist-info}/licenses/LICENSE RENAMED Viewed

File without changes

{better_git_of_theseus-0.4.2.dist-info → better_git_of_theseus-0.5.0.dist-info}/top_level.txt RENAMED Viewed

File without changes

better-git-of-theseus 0.4.2__py3-none-any.whl → 0.5.0__py3-none-any.whl

better-git-of-theseus 0.4.2py3-none-any.whl → 0.5.0py3-none-any.whl