PyPI - logdetective - Versions diffs - 1.4.0__tar.gz → 1.6.0__tar.gz - Mend

logdetective 1.4.0tar.gz → 1.6.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

{logdetective-1.4.0 → logdetective-1.6.0}/PKG-INFO RENAMED Viewed

@@ -1,12 +1,12 @@
 Metadata-Version: 2.3
 Name: logdetective
-Version: 1.4.0
+Version: 1.6.0
 Summary: Log using LLM AI to search for build/test failures and provide ideas for fixing these.
 License: Apache-2.0
 Author: Jiri Podivin
 Author-email: jpodivin@gmail.com
 Requires-Python: >=3.11,<4.0
-Classifier: Development Status :: 4 - Beta
+Classifier: Development Status :: 5 - Production/Stable
 Classifier: Environment :: Console
 Classifier: Intended Audience :: Developers
 Classifier: License :: OSI Approved :: Apache Software License
@@ -87,9 +87,10 @@ Usage
 To analyze a log file, run the script with the following command line arguments:
 - `url` (required): The URL of the log file to be analyzed.
 - `--model` (optional, default: "Mistral-7B-Instruct-v0.2-GGUF"): The path or URL of the language model for analysis. As we are using LLama.cpp we want this to be in the `gguf` format. You can include the download link to the model here. If the model is already on your machine it will skip the download.
-- `--summarizer` (optional, default: "drain"): Choose between LLM and Drain template miner as the log summarizer. You can also provide the path to an existing language model file instead of using a URL.
-- `--n_lines` (optional, default: 8): The number of lines per chunk for LLM analysis. This only makes sense when you are summarizing with LLM.
+- `--summarizer` DISABLED: LLM summarization option was removed. Argument is kept for backward compatibility only.(optional, default: "drain"): Choose between LLM and Drain template miner as the log summarizer. You can also provide the path to an existing language model file instead of using a URL.
+- `--n_lines` DISABLED: LLM summarization option was removed. Argument is kept for backward compatibility only. (optional, default: 8): The number of lines per chunk for LLM analysis. This only makes sense when you are summarizing with LLM.
 - `--n_clusters` (optional, default 8): Number of clusters for Drain to organize log chunks into. This only makes sense when you are summarizing with Drain
+- `--skip_snippets` Path to patterns for skipping snippets.
 Example usage:
@@ -376,6 +377,9 @@ HTTPS certificate generated through:
 certbot certonly --standalone -d logdetective01.fedorainfracloud.org
 ```
+Certificates need to be be placed into location specified by the`LOGDETECTIVE_CERTDIR`
+env var and the service should be restarted.
 Querying statistics
 -------------------
@@ -435,6 +439,28 @@ with spaces, or replacement fields marked with curly braces, `{}` left for inser
 Number of replacement fields in new prompts, must be the same as in originals.
 Although their position may be different.
+Skip Snippets
+-------------
+Certain log chunks may not contribute to the analysis of the problem under any circumstances.
+User can specify regular expressions, matching such log chunks, along with simple description,
+using Skip Snippets feature.
+Patterns to be skipped must be defined yaml file as a dictionary, where key is a description
+and value is a regular expression. For example:
+```
+child_exit_code_zero: "Child return code was: 0"
+```
+Special care must be taken not to write a regular expression which may match
+too many chunks, or which may be evaluated as data structure by the yaml parser.
+Example of a valid pattern definition file: `logdetective/skip_patterns.yml`,
+can be used as a starting point and is used as a default if no other definition is provided.
 License
 -------

{logdetective-1.4.0 → logdetective-1.6.0}/README.md RENAMED Viewed

@@ -43,9 +43,10 @@ Usage
 To analyze a log file, run the script with the following command line arguments:
 - `url` (required): The URL of the log file to be analyzed.
 - `--model` (optional, default: "Mistral-7B-Instruct-v0.2-GGUF"): The path or URL of the language model for analysis. As we are using LLama.cpp we want this to be in the `gguf` format. You can include the download link to the model here. If the model is already on your machine it will skip the download.
-- `--summarizer` (optional, default: "drain"): Choose between LLM and Drain template miner as the log summarizer. You can also provide the path to an existing language model file instead of using a URL.
-- `--n_lines` (optional, default: 8): The number of lines per chunk for LLM analysis. This only makes sense when you are summarizing with LLM.
+- `--summarizer` DISABLED: LLM summarization option was removed. Argument is kept for backward compatibility only.(optional, default: "drain"): Choose between LLM and Drain template miner as the log summarizer. You can also provide the path to an existing language model file instead of using a URL.
+- `--n_lines` DISABLED: LLM summarization option was removed. Argument is kept for backward compatibility only. (optional, default: 8): The number of lines per chunk for LLM analysis. This only makes sense when you are summarizing with LLM.
 - `--n_clusters` (optional, default 8): Number of clusters for Drain to organize log chunks into. This only makes sense when you are summarizing with Drain
+- `--skip_snippets` Path to patterns for skipping snippets.
 Example usage:
@@ -332,6 +333,9 @@ HTTPS certificate generated through:
 certbot certonly --standalone -d logdetective01.fedorainfracloud.org
 ```
+Certificates need to be be placed into location specified by the`LOGDETECTIVE_CERTDIR`
+env var and the service should be restarted.
 Querying statistics
 -------------------
@@ -391,6 +395,28 @@ with spaces, or replacement fields marked with curly braces, `{}` left for inser
 Number of replacement fields in new prompts, must be the same as in originals.
 Although their position may be different.
+Skip Snippets
+-------------
+Certain log chunks may not contribute to the analysis of the problem under any circumstances.
+User can specify regular expressions, matching such log chunks, along with simple description,
+using Skip Snippets feature.
+Patterns to be skipped must be defined yaml file as a dictionary, where key is a description
+and value is a regular expression. For example:
+```
+child_exit_code_zero: "Child return code was: 0"
+```
+Special care must be taken not to write a regular expression which may match
+too many chunks, or which may be evaluated as data structure by the yaml parser.
+Example of a valid pattern definition file: `logdetective/skip_patterns.yml`,
+can be used as a starting point and is used as a default if no other definition is provided.
 License
 -------

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/constants.py RENAMED Viewed

@@ -26,17 +26,6 @@ Analysis:
 """
-SUMMARIZATION_PROMPT_TEMPLATE = """
-Does following log contain error or issue?
-Log:
-{}
-Answer:
-"""
 SNIPPET_PROMPT_TEMPLATE = """
 Analyse following RPM build log snippet. Describe contents accurately, without speculation or suggestions for resolution.

logdetective-1.6.0/logdetective/extractors.py ADDED Viewed

@@ -0,0 +1,55 @@
+import os
+import logging
+from typing import Tuple
+import drain3
+from drain3.template_miner_config import TemplateMinerConfig
+from logdetective.utils import get_chunks, filter_snippet_patterns
+from logdetective.models import SkipSnippets
+LOG = logging.getLogger("logdetective")
+class DrainExtractor:
+    """A class that extracts information from logs using a template miner algorithm."""
+    def __init__(
+        self,
+        verbose: bool = False,
+        context: bool = False,
+        max_clusters=8,
+        skip_snippets: SkipSnippets = SkipSnippets({}),
+    ):
+        config = TemplateMinerConfig()
+        config.load(f"{os.path.dirname(__file__)}/drain3.ini")
+        config.profiling_enabled = verbose
+        config.drain_max_clusters = max_clusters
+        self.miner = drain3.TemplateMiner(config=config)
+        self.verbose = verbose
+        self.context = context
+        self.skip_snippets = skip_snippets
+    def __call__(self, log: str) -> list[Tuple[int, str]]:
+        out = []
+        # Create chunks
+        chunks = list(get_chunks(log))
+        # Keep only chunks that don't match any of the excluded patterns
+        chunks = [
+            (_, chunk)
+            for _, chunk in chunks
+            if not filter_snippet_patterns(chunk, self.skip_snippets)
+        ]
+        # First pass create clusters
+        for _, chunk in chunks:
+            processed_chunk = self.miner.add_log_message(chunk)
+            LOG.debug(processed_chunk)
+        clusters = list(self.miner.drain.clusters)
+        # Second pass, only matching lines with clusters,
+        # to recover original text
+        for chunk_start, chunk in chunks:
+            cluster = self.miner.match(chunk, "always")
+            if cluster in clusters:
+                out.append((chunk_start, chunk))
+                clusters.remove(cluster)
+        return out

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/logdetective.py RENAMED Viewed

@@ -14,8 +14,9 @@ from logdetective.utils import (
     format_snippets,
     compute_certainty,
     load_prompts,
+    load_skip_snippet_patterns,
 )
-from logdetective.extractors import LLMExtractor, DrainExtractor
+from logdetective.extractors import DrainExtractor
 LOG = logging.getLogger("logdetective")
@@ -49,16 +50,16 @@ def setup_args():
         "--summarizer",
         type=str,
         default="drain",
-        help="Choose between LLM and Drain template miner as the log summarizer.\
-                                LLM must be specified as path to a model, URL or local file.",
+        help="DISABLED: LLM summarization option was removed. \
+                Argument is kept for backward compatibility only.",
     )
     parser.add_argument(
         "-N",
         "--n_lines",
         type=int,
-        default=8,
-        help="The number of lines per chunk for LLM analysis.\
-                            This only makes sense when you are summarizing with LLM.",
+        default=None,
+        help="DISABLED: LLM summarization option was removed. \
+                Argument is kept for backward compatibility only.",
     )
     parser.add_argument(
         "-C",
@@ -74,13 +75,19 @@ def setup_args():
         "--prompts",
         type=str,
         default=f"{os.path.dirname(__file__)}/prompts.yml",
-        help="Path to prompt configuration file."
+        help="Path to prompt configuration file.",
     )
     parser.add_argument(
         "--temperature",
         type=float,
         default=DEFAULT_TEMPERATURE,
-        help="Temperature for inference."
+        help="Temperature for inference.",
+    )
+    parser.add_argument(
+        "--skip_snippets",
+        type=str,
+        default=f"{os.path.dirname(__file__)}/skip_snippets.yml",
+        help="Path to patterns for skipping snippets.",
     )
     return parser.parse_args()
@@ -93,6 +100,10 @@ async def run():  # pylint: disable=too-many-statements,too-many-locals
         sys.stderr.write("Error: --quiet and --verbose is mutually exclusive.\n")
         sys.exit(2)
+    # Emit warning about use of discontinued args
+    if args.n_lines or args.summarizer != "drain":
+        LOG.warning("LLM based summarization was removed. Drain will be used instead.")
     # Logging facility setup
     log_level = logging.INFO
     if args.verbose >= 1:
@@ -116,18 +127,19 @@ async def run():  # pylint: disable=too-many-statements,too-many-locals
         LOG.error("You likely do not have enough memory to load the AI model")
         sys.exit(3)
-    # Log file summarizer selection and initialization
-    if args.summarizer == "drain":
-        extractor = DrainExtractor(
-            args.verbose > 1, context=True, max_clusters=args.n_clusters
-        )
-    else:
-        summarizer_model = initialize_model(args.summarizer, verbose=args.verbose > 2)
-        extractor = LLMExtractor(
-            summarizer_model,
-            args.verbose > 1,
-            prompts_configuration.summarization_prompt_template,
-        )
+    try:
+        skip_snippets = load_skip_snippet_patterns(args.skip_snippets)
+    except OSError as e:
+        LOG.error(e)
+        sys.exit(5)
+    # Log file summarizer initialization
+    extractor = DrainExtractor(
+        args.verbose > 1,
+        context=True,
+        max_clusters=args.n_clusters,
+        skip_snippets=skip_snippets,
+    )
     LOG.info("Getting summary")
@@ -151,7 +163,8 @@ async def run():  # pylint: disable=too-many-statements,too-many-locals
     prompt = (
         f"{prompts_configuration.default_system_prompt}\n"
-        f"{prompts_configuration.prompt_template}")
+        f"{prompts_configuration.prompt_template}"
+    )
     stream = True
     if args.no_stream:
@@ -191,7 +204,7 @@ async def run():  # pylint: disable=too-many-statements,too-many-locals
 def main():
-    """ Evaluate logdetective program and wait for it to finish """
+    """Evaluate logdetective program and wait for it to finish"""
     asyncio.run(run())

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/models.py RENAMED Viewed

@@ -1,10 +1,10 @@
+import re
 from typing import Optional
-from pydantic import BaseModel
+from pydantic import BaseModel, model_validator
 from logdetective.constants import (
     PROMPT_TEMPLATE,
     PROMPT_TEMPLATE_STAGED,
-    SUMMARIZATION_PROMPT_TEMPLATE,
     SNIPPET_PROMPT_TEMPLATE,
     DEFAULT_SYSTEM_PROMPT,
 )
@@ -14,7 +14,6 @@ class PromptConfig(BaseModel):
     """Configuration for basic log detective prompts."""
     prompt_template: str = PROMPT_TEMPLATE
-    summarization_prompt_template: str = SUMMARIZATION_PROMPT_TEMPLATE
     snippet_prompt_template: str = SNIPPET_PROMPT_TEMPLATE
     prompt_template_staged: str = PROMPT_TEMPLATE_STAGED
@@ -27,9 +26,6 @@ class PromptConfig(BaseModel):
         if data is None:
             return
         self.prompt_template = data.get("prompt_template", PROMPT_TEMPLATE)
-        self.summarization_prompt_template = data.get(
-            "summarization_prompt_template", SUMMARIZATION_PROMPT_TEMPLATE
-        )
         self.snippet_prompt_template = data.get(
             "snippet_prompt_template", SNIPPET_PROMPT_TEMPLATE
         )
@@ -45,3 +41,33 @@ class PromptConfig(BaseModel):
         self.staged_system_prompt = data.get(
             "staged_system_prompt", DEFAULT_SYSTEM_PROMPT
         )
+class SkipSnippets(BaseModel):
+    """Regular expressions defining snippets we should not analyze"""
+    snippet_patterns: dict[str, re.Pattern] = {}
+    def __init__(self, data: Optional[dict] = None):
+        super().__init__(data=data)
+        if data is None:
+            return
+        self.snippet_patterns = {
+            key: re.compile(pattern) for key, pattern in data.items()
+        }
+    @model_validator(mode="before")
+    @classmethod
+    def check_patterns(cls, data: dict):
+        """Check if all supplied patterns are valid regular expressions.
+        Techically replicating what is done in __init__ but with nicer error message."""
+        patterns = data["data"]
+        for key, pattern in patterns.items():
+            try:
+                re.compile(pattern=pattern)
+            except (TypeError, re.error) as ex:
+                raise ValueError(
+                    f"Invalid pattern `{pattern}` with name `{key}` supplied for skipping in logs."
+                ) from ex
+        return data

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/prompts.yml RENAMED Viewed

@@ -21,17 +21,6 @@ prompt_template: |
   Analysis:
-summarization_prompt_template: |
-  Does following log contain error or issue?
-  Log:
-  {}
-  Answer:
 snippet_prompt_template: |
   Analyse following RPM build log snippet. Describe contents accurately, without speculation or suggestions for resolution.

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/remote_log.py RENAMED Viewed

@@ -53,7 +53,7 @@ class RemoteLog:
             LOG.debug("process url %s", self.url)
             try:
                 response = await self._http_session.get(self.url, raise_for_status=True)
-            except aiohttp.ClientResponseError as ex:
+            except (aiohttp.ClientResponseError, aiohttp.ClientConnectorError) as ex:
                 raise RuntimeError(f"We couldn't obtain the logs: {ex}") from ex
             return await response.text()
         LOG.error("Invalid URL received ")
@@ -64,6 +64,4 @@ class RemoteLog:
         try:
             return await self.get_url_content()
         except RuntimeError as ex:
-            raise HTTPBadRequest(
-                reason=f"We couldn't obtain the logs: {ex}"
-            ) from ex
+            raise HTTPBadRequest(reason=f"We couldn't obtain the logs: {ex}") from ex

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/server/config.py RENAMED Viewed

@@ -3,8 +3,9 @@ import logging
 import yaml
 from openai import AsyncOpenAI
-from logdetective.utils import load_prompts
+from logdetective.utils import load_prompts, load_skip_snippet_patterns
 from logdetective.server.models import Config, InferenceConfig
+import logdetective
 def load_server_config(path: str | None) -> Config:
@@ -52,18 +53,24 @@ def get_log(config: Config):
 def get_openai_api_client(ineference_config: InferenceConfig):
-    """Set up AsyncOpenAI client with default configuration.
-    """
+    """Set up AsyncOpenAI client with default configuration."""
     return AsyncOpenAI(
-        api_key=ineference_config.api_token,
-        base_url=ineference_config.url)
+        api_key=ineference_config.api_token, base_url=ineference_config.url
+    )
 SERVER_CONFIG_PATH = os.environ.get("LOGDETECTIVE_SERVER_CONF", None)
 SERVER_PROMPT_PATH = os.environ.get("LOGDETECTIVE_PROMPTS", None)
+# The default location for skip patterns is in the same directory
+# as logdetective __init__.py file.
+SERVER_SKIP_PATTERNS_PATH = os.environ.get(
+    "LOGDETECIVE_SKIP_PATTERNS",
+    f"{os.path.dirname(logdetective.__file__)}/skip_snippets.yml",
+)
 SERVER_CONFIG = load_server_config(SERVER_CONFIG_PATH)
 PROMPT_CONFIG = load_prompts(SERVER_PROMPT_PATH)
+SKIP_SNIPPETS_CONFIG = load_skip_snippet_patterns(SERVER_SKIP_PATTERNS_PATH)
 LOG = get_log(SERVER_CONFIG)

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/server/emoji.py RENAMED Viewed

@@ -51,7 +51,9 @@ async def _handle_gitlab_operation(func: Callable, *args):
         else:
             LOG.exception(log_msg)
     except Exception as e:  # pylint: disable=broad-exception-caught
-        LOG.exception("Unexpected error during GitLab operation %s(%s): %s", func, args, e)
+        LOG.exception(
+            "Unexpected error during GitLab operation %s(%s): %s", func, args, e
+        )
 async def collect_emojis_in_comments(  # pylint: disable=too-many-locals

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/server/llm.py RENAMED Viewed

@@ -16,7 +16,13 @@ from logdetective.utils import (
     compute_certainty,
     prompt_to_messages,
 )
-from logdetective.server.config import LOG, SERVER_CONFIG, PROMPT_CONFIG, CLIENT
+from logdetective.server.config import (
+    LOG,
+    SERVER_CONFIG,
+    PROMPT_CONFIG,
+    CLIENT,
+    SKIP_SNIPPETS_CONFIG,
+)
 from logdetective.server.models import (
     AnalyzedSnippet,
     InferenceConfig,
@@ -42,7 +48,10 @@ def format_analyzed_snippets(snippets: list[AnalyzedSnippet]) -> str:
 def mine_logs(log: str) -> List[Tuple[int, str]]:
     """Extract snippets from log text"""
     extractor = DrainExtractor(
-        verbose=True, context=True, max_clusters=SERVER_CONFIG.extractor.max_clusters
+        verbose=True,
+        context=True,
+        max_clusters=SERVER_CONFIG.extractor.max_clusters,
+        skip_snippets=SKIP_SNIPPETS_CONFIG,
     )
     LOG.info("Getting summary")

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/server/plot.py RENAMED Viewed

@@ -2,12 +2,10 @@ import datetime
 from typing import Optional, Union, Dict
 import numpy
-import matplotlib
-import matplotlib.figure
-import matplotlib.pyplot
+from numpy.typing import ArrayLike
+from matplotlib import dates, colormaps, axes, pyplot, figure
-from matplotlib.pyplot import cm
-from logdetective.server import models
+from logdetective.server.models import TimePeriod
 from logdetective.server.database.models import (
     AnalyzeRequestMetrics,
     EndpointType,
@@ -18,25 +16,25 @@ from logdetective.server.database.models import (
 class Definition:
     """Define plot details, given a time period."""
-    def __init__(self, time_period: models.TimePeriod):
+    def __init__(self, time_period: TimePeriod):
         self.time_period = time_period
         self.days_diff = time_period.get_time_period().days
         if self.time_period.hours:
             self._freq = "H"
             self._time_format = "%Y-%m-%d %H"
-            self._locator = matplotlib.dates.HourLocator(interval=2)
+            self._locator = dates.HourLocator(interval=2)
             self._time_unit = "hour"
             self._time_delta = datetime.timedelta(hours=1)
         elif self.time_period.days:
             self._freq = "D"
             self._time_format = "%Y-%m-%d"
-            self._locator = matplotlib.dates.DayLocator(interval=1)
+            self._locator = dates.DayLocator(interval=1)
             self._time_unit = "day"
             self._time_delta = datetime.timedelta(days=1)
         elif self.time_period.weeks:
             self._freq = "W"
             self._time_format = "%Y-%m-%d"
-            self._locator = matplotlib.dates.WeekdayLocator(interval=1)
+            self._locator = dates.WeekdayLocator(interval=1)
             self._time_unit = "week"
             self._time_delta = datetime.timedelta(weeks=1)
@@ -120,10 +118,10 @@ def create_time_series_arrays(
 def _add_bar_chart(
-    ax: matplotlib.figure.Axes,
+    ax: axes.Axes,
     plot_def: Definition,
-    timestamps: numpy.array,
-    values: numpy.array,
+    timestamps: ArrayLike,
+    values: ArrayLike,
     label: str,
 ) -> None:
     """Add a blue bar chart"""
@@ -142,18 +140,18 @@ def _add_bar_chart(
     ax.set_ylabel(label, color="blue")
     ax.tick_params(axis="y", labelcolor="blue")
-    ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter(plot_def.time_format))
+    ax.xaxis.set_major_formatter(dates.DateFormatter(plot_def.time_format))
     ax.xaxis.set_major_locator(plot_def.locator)
-    matplotlib.pyplot.xticks(rotation=45)
+    pyplot.xticks(rotation=45)
     ax.grid(True, alpha=0.3)
 def _add_line_chart(  # pylint: disable=too-many-arguments disable=too-many-positional-arguments
-    ax: matplotlib.figure.Axes,
-    timestamps: numpy.array,
-    values: numpy.array,
+    ax: axes.Axes,
+    timestamps: ArrayLike,
+    values: ArrayLike,
     label: str,
     color: str = "red",
     set_label: bool = True,
@@ -166,10 +164,10 @@ def _add_line_chart(  # pylint: disable=too-many-arguments disable=too-many-posi
 def requests_per_time(
-    period_of_time: models.TimePeriod,
+    period_of_time: TimePeriod,
     endpoint: EndpointType = EndpointType.ANALYZE,
     end_time: Optional[datetime.datetime] = None,
-) -> matplotlib.figure.Figure:
+) -> figure.Figure:
     """
     Generate a visualization of request counts over a specified time period.
@@ -200,13 +198,13 @@ def requests_per_time(
         requests_counts, plot_def, start_time, end_time
     )
-    fig, ax1 = matplotlib.pyplot.subplots(figsize=(12, 6))
+    fig, ax1 = pyplot.subplots(figsize=(12, 6))
     _add_bar_chart(ax1, plot_def, timestamps, counts, "Requests")
     ax2 = ax1.twinx()
     _add_line_chart(ax2, timestamps, numpy.cumsum(counts), "Cumulative Requests")
-    matplotlib.pyplot.title(
+    pyplot.title(
         f"Requests received for API {endpoint} ({start_time.strftime(plot_def.time_format)} "
         f"to {end_time.strftime(plot_def.time_format)})"
     )
@@ -215,16 +213,16 @@ def requests_per_time(
     lines2, labels2 = ax2.get_legend_handles_labels()
     ax1.legend(lines1 + lines2, labels1 + labels2, loc="center")
-    matplotlib.pyplot.tight_layout()
+    pyplot.tight_layout()
     return fig
 def average_time_per_responses(  # pylint: disable=too-many-locals
-    period_of_time: models.TimePeriod,
+    period_of_time: TimePeriod,
     endpoint: EndpointType = EndpointType.ANALYZE,
     end_time: Optional[datetime.datetime] = None,
-) -> matplotlib.figure.Figure:
+) -> figure.Figure:
     """
     Generate a visualization of average response time and length over a specified time period.
@@ -259,7 +257,7 @@ def average_time_per_responses(  # pylint: disable=too-many-locals
         float,
     )
-    fig, ax1 = matplotlib.pyplot.subplots(figsize=(12, 6))
+    fig, ax1 = pyplot.subplots(figsize=(12, 6))
     _add_bar_chart(
         ax1, plot_def, timestamps, average_time, "average response time (seconds)"
     )
@@ -280,7 +278,7 @@ def average_time_per_responses(  # pylint: disable=too-many-locals
     ax2 = ax1.twinx()
     _add_line_chart(ax2, timestamps, average_length, "average response length (chars)")
-    matplotlib.pyplot.title(
+    pyplot.title(
         f"average response time for API {endpoint} ({start_time.strftime(plot_def.time_format)} "
         f"to {end_time.strftime(plot_def.time_format)})"
     )
@@ -289,7 +287,7 @@ def average_time_per_responses(  # pylint: disable=too-many-locals
     lines2, labels2 = ax2.get_legend_handles_labels()
     ax1.legend(lines1 + lines2, labels1 + labels2, loc="center")
-    matplotlib.pyplot.tight_layout()
+    pyplot.tight_layout()
     return fig
@@ -322,7 +320,7 @@ def _collect_emoji_data(
 def _plot_emoji_data(  # pylint: disable=too-many-locals
-    ax: matplotlib.figure.Axes,
+    ax: axes.Axes,
     reactions_values_dict: Dict[str, Dict[datetime.datetime, int]],
     plot_def: Definition,
     start_time: datetime.datetime,
@@ -340,7 +338,10 @@ def _plot_emoji_data(  # pylint: disable=too-many-locals
         )
         all_counts.extend(counts)
-    colors = [cm.viridis(i) for i in numpy.linspace(0, 1, len(reactions_values_dict))]    # pylint: disable=no-member
+    colors = [
+        colormaps["viridis"](i)
+        for i in numpy.linspace(0, 1, len(reactions_values_dict))
+    ]
     first_emoji = True
     for i, (emoji, dict_counts) in enumerate(reactions_values_dict.items()):
@@ -369,9 +370,9 @@ def _plot_emoji_data(  # pylint: disable=too-many-locals
 def emojis_per_time(
-    period_of_time: models.TimePeriod,
+    period_of_time: TimePeriod,
     end_time: Optional[datetime.datetime] = None,
-) -> matplotlib.figure.Figure:
+) -> figure.Figure:
     """
     Generate a visualization of overall emoji feedback
     over a specified time period.
@@ -396,13 +397,13 @@ def emojis_per_time(
     start_time = period_of_time.get_period_start_time(end_time)
     reactions_values_dict = _collect_emoji_data(start_time, plot_def)
-    fig, ax = matplotlib.pyplot.subplots(figsize=(12, 6))
+    fig, ax = pyplot.subplots(figsize=(12, 6))
     emoji_lines, emoji_labels = _plot_emoji_data(
         ax, reactions_values_dict, plot_def, start_time, end_time
     )
-    matplotlib.pyplot.title(
+    pyplot.title(
         f"Emoji feedback ({start_time.strftime(plot_def.time_format)} "
         f"to {end_time.strftime(plot_def.time_format)})"
     )
@@ -419,11 +420,11 @@ def emojis_per_time(
     ax.set_ylabel("Count")
     # Format x-axis
-    ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter(plot_def.time_format))
+    ax.xaxis.set_major_formatter(dates.DateFormatter(plot_def.time_format))
     ax.xaxis.set_major_locator(plot_def.locator)
     ax.tick_params(axis="x", labelrotation=45)
     ax.grid(True, alpha=0.3)
-    matplotlib.pyplot.tight_layout()
+    pyplot.tight_layout()
     return fig

logdetective-1.6.0/logdetective/skip_snippets.yml ADDED Viewed

@@ -0,0 +1,12 @@
+# This file holds patterns you want to skip during log parsing.
+# By default, no patterns are supplied.
+# Patterns are to be specified as values of dictionary,
+# with each key being a descriptive name of the pattern.
+# Patterns themselves are evaluated as a regular expression.
+# Make sure to avoid regular expressions that may be interpreted
+# as yaml syntax.
+# Example:
+# contains_capital_a: "^.*A.*"
+# starts_with_numeric: "^[0-9].*"
+child_exit_code_zero: "Child return code was: 0"

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/utils.py RENAMED Viewed

@@ -8,7 +8,7 @@ import numpy as np
 import yaml
 from llama_cpp import Llama, CreateCompletionResponse, CreateCompletionStreamResponse
-from logdetective.models import PromptConfig
+from logdetective.models import PromptConfig, SkipSnippets
 from logdetective.remote_log import RemoteLog
@@ -179,7 +179,7 @@ def format_snippets(snippets: list[str] | list[Tuple[int, str]]) -> str:
             summary += f"""
             Snippet No. {i}:
-            {s[1]}
+            {s}
             ================
             """
     return summary
@@ -198,8 +198,11 @@ def load_prompts(path: str | None) -> PromptConfig:
 def prompt_to_messages(
-        user_message: str, system_prompt: str | None = None,
-        system_role: str = "developer", user_role: str = "user") -> List[Dict[str, str]]:
+    user_message: str,
+    system_prompt: str | None = None,
+    system_role: str = "developer",
+    user_role: str = "user",
+) -> List[Dict[str, str]]:
     """Turn prompt into list of message dictionaries.
     If `system_role` and `user_role` are the same, only a single message is created,
     as concatenation of `user_message` and `system_prompt`. This is useful for models which
@@ -208,22 +211,39 @@ def prompt_to_messages(
     if system_role == user_role:
         messages = [
-            {
-                "role": system_role,
-                "content": f"{system_prompt}\n{user_message}"
-            }
+            {"role": system_role, "content": f"{system_prompt}\n{user_message}"}
         ]
     else:
         messages = [
-            {
-                "role": system_role,
-                "content": system_prompt
-            },
+            {"role": system_role, "content": system_prompt},
             {
                 "role": user_role,
                 "content": user_message,
-            }
+            },
         ]
     return messages
+def filter_snippet_patterns(snippet: str, skip_snippets: SkipSnippets) -> bool:
+    """Try to match snippet agains provided patterns to determine if we should
+    filter it out or not."""
+    for key, pattern in skip_snippets.snippet_patterns.items():
+        if pattern.match(snippet):
+            LOG.debug("Snippet `%s` has matched agains skip pattern %s", snippet, key)
+            return True
+    return False
+def load_skip_snippet_patterns(path: str | None) -> SkipSnippets:
+    """Load dictionary of snippet patterns we want to skip."""
+    if path:
+        try:
+            with open(path, "r") as file:
+                return SkipSnippets(yaml.safe_load(file))
+        except OSError as e:
+            LOG.error("Couldn't open file with snippet skip patterns `%s`", path)
+            raise e
+    return SkipSnippets({})

{logdetective-1.4.0 → logdetective-1.6.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "logdetective"
-version = "1.4.0"
+version = "1.6.0"
 description = "Log using LLM AI to search for build/test failures and provide ideas for fixing these."
 authors = ["Jiri Podivin <jpodivin@gmail.com>"]
 license = "Apache-2.0"
@@ -15,7 +15,7 @@ packages = [
     { include = "logdetective" }
 ]
 classifiers = [
-    "Development Status :: 4 - Beta",
+    "Development Status :: 5 - Production/Stable",
     "Environment :: Console",
     "Intended Audience :: Developers",
     "License :: OSI Approved :: Apache Software License",

logdetective-1.4.0/logdetective/extractors.py DELETED Viewed

@@ -1,105 +0,0 @@
-import os
-import logging
-from typing import Tuple
-import drain3
-from drain3.template_miner_config import TemplateMinerConfig
-from llama_cpp import Llama, LlamaGrammar
-from logdetective.constants import SUMMARIZATION_PROMPT_TEMPLATE
-from logdetective.utils import get_chunks
-LOG = logging.getLogger("logdetective")
-class LLMExtractor:
-    """
-    A class that extracts relevant information from logs using a language model.
-    """
-    def __init__(
-        self,
-        model: Llama,
-        n_lines: int = 2,
-        prompt: str = SUMMARIZATION_PROMPT_TEMPLATE,
-    ):
-        self.model = model
-        self.n_lines = n_lines
-        self.grammar = LlamaGrammar.from_string(
-            'root ::= ("Yes" | "No")', verbose=False
-        )
-        self.prompt = prompt
-    def __call__(
-        self, log: str, n_lines: int = 2, neighbors: bool = False
-    ) -> list[str]:
-        chunks = self.rate_chunks(log)
-        out = self.create_extract(chunks, neighbors)
-        return out
-    def rate_chunks(self, log: str) -> list[tuple]:
-        """Scan log by the model and store results.
-        :param log: log file content
-        """
-        results = []
-        log_lines = log.split("\n")
-        for i in range(0, len(log_lines), self.n_lines):
-            block = "\n".join(log_lines[i: i + self.n_lines])
-            prompt = self.prompt.format(log)
-            out = self.model(prompt, max_tokens=7, grammar=self.grammar)
-            out = f"{out['choices'][0]['text']}\n"
-            results.append((block, out))
-        return results
-    def create_extract(self, chunks: list[tuple], neighbors: bool = False) -> list[str]:
-        """Extract interesting chunks from the model processing."""
-        interesting = []
-        summary = []
-        # pylint: disable=consider-using-enumerate
-        for i in range(len(chunks)):
-            if chunks[i][1].startswith("Yes"):
-                interesting.append(i)
-                if neighbors:
-                    interesting.extend([max(i - 1, 0), min(i + 1, len(chunks) - 1)])
-        interesting = set(interesting)
-        for i in interesting:
-            summary.append(chunks[i][0])
-        return summary
-class DrainExtractor:
-    """A class that extracts information from logs using a template miner algorithm."""
-    def __init__(self, verbose: bool = False, context: bool = False, max_clusters=8):
-        config = TemplateMinerConfig()
-        config.load(f"{os.path.dirname(__file__)}/drain3.ini")
-        config.profiling_enabled = verbose
-        config.drain_max_clusters = max_clusters
-        self.miner = drain3.TemplateMiner(config=config)
-        self.verbose = verbose
-        self.context = context
-    def __call__(self, log: str) -> list[Tuple[int, str]]:
-        out = []
-        # First pass create clusters
-        for _, chunk in get_chunks(log):
-            processed_chunk = self.miner.add_log_message(chunk)
-            LOG.debug(processed_chunk)
-        # Sort found clusters by size, descending order
-        sorted_clusters = sorted(
-            self.miner.drain.clusters, key=lambda it: it.size, reverse=True
-        )
-        # Second pass, only matching lines with clusters,
-        # to recover original text
-        for chunk_start, chunk in get_chunks(log):
-            cluster = self.miner.match(chunk, "always")
-            if cluster in sorted_clusters:
-                out.append((chunk_start, chunk))
-                sorted_clusters.remove(cluster)
-        return out

{logdetective-1.4.0 → logdetective-1.6.0}/LICENSE RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/__init__.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/drain3.ini RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/prompts-summary-first.yml RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/prompts-summary-only.yml RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/server/__init__.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/server/compressors.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/server/database/__init__.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/server/database/base.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/server/database/models/__init__.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/server/database/models/merge_request_jobs.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/server/database/models/metrics.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/server/gitlab.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/server/metric.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/server/models.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/server/server.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/server/templates/gitlab_full_comment.md.j2 RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective/server/templates/gitlab_short_comment.md.j2 RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.6.0}/logdetective.1.asciidoc RENAMED Viewed

File without changes

logdetective 1.4.0__tar.gz → 1.6.0__tar.gz

logdetective 1.4.0tar.gz → 1.6.0tar.gz