PyPI - logdetective - Versions diffs - 1.4.0__tar.gz → 1.5.0__tar.gz - Mend

logdetective 1.4.0tar.gz → 1.5.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

{logdetective-1.4.0 → logdetective-1.5.0}/PKG-INFO RENAMED Viewed

@@ -1,12 +1,12 @@
 Metadata-Version: 2.3
 Name: logdetective
-Version: 1.4.0
+Version: 1.5.0
 Summary: Log using LLM AI to search for build/test failures and provide ideas for fixing these.
 License: Apache-2.0
 Author: Jiri Podivin
 Author-email: jpodivin@gmail.com
 Requires-Python: >=3.11,<4.0
-Classifier: Development Status :: 4 - Beta
+Classifier: Development Status :: 5 - Production/Stable
 Classifier: Environment :: Console
 Classifier: Intended Audience :: Developers
 Classifier: License :: OSI Approved :: Apache Software License
@@ -87,8 +87,8 @@ Usage
 To analyze a log file, run the script with the following command line arguments:
 - `url` (required): The URL of the log file to be analyzed.
 - `--model` (optional, default: "Mistral-7B-Instruct-v0.2-GGUF"): The path or URL of the language model for analysis. As we are using LLama.cpp we want this to be in the `gguf` format. You can include the download link to the model here. If the model is already on your machine it will skip the download.
-- `--summarizer` (optional, default: "drain"): Choose between LLM and Drain template miner as the log summarizer. You can also provide the path to an existing language model file instead of using a URL.
-- `--n_lines` (optional, default: 8): The number of lines per chunk for LLM analysis. This only makes sense when you are summarizing with LLM.
+- `--summarizer` DISABLED: LLM summarization option was removed. Argument is kept for backward compatibility only.(optional, default: "drain"): Choose between LLM and Drain template miner as the log summarizer. You can also provide the path to an existing language model file instead of using a URL.
+- `--n_lines` DISABLED: LLM summarization option was removed. Argument is kept for backward compatibility only. (optional, default: 8): The number of lines per chunk for LLM analysis. This only makes sense when you are summarizing with LLM.
 - `--n_clusters` (optional, default 8): Number of clusters for Drain to organize log chunks into. This only makes sense when you are summarizing with Drain
 Example usage:
@@ -376,6 +376,9 @@ HTTPS certificate generated through:
 certbot certonly --standalone -d logdetective01.fedorainfracloud.org
 ```
+Certificates need to be be placed into location specified by the`LOGDETECTIVE_CERTDIR`
+env var and the service should be restarted.
 Querying statistics
 -------------------

{logdetective-1.4.0 → logdetective-1.5.0}/README.md RENAMED Viewed

@@ -43,8 +43,8 @@ Usage
 To analyze a log file, run the script with the following command line arguments:
 - `url` (required): The URL of the log file to be analyzed.
 - `--model` (optional, default: "Mistral-7B-Instruct-v0.2-GGUF"): The path or URL of the language model for analysis. As we are using LLama.cpp we want this to be in the `gguf` format. You can include the download link to the model here. If the model is already on your machine it will skip the download.
-- `--summarizer` (optional, default: "drain"): Choose between LLM and Drain template miner as the log summarizer. You can also provide the path to an existing language model file instead of using a URL.
-- `--n_lines` (optional, default: 8): The number of lines per chunk for LLM analysis. This only makes sense when you are summarizing with LLM.
+- `--summarizer` DISABLED: LLM summarization option was removed. Argument is kept for backward compatibility only.(optional, default: "drain"): Choose between LLM and Drain template miner as the log summarizer. You can also provide the path to an existing language model file instead of using a URL.
+- `--n_lines` DISABLED: LLM summarization option was removed. Argument is kept for backward compatibility only. (optional, default: 8): The number of lines per chunk for LLM analysis. This only makes sense when you are summarizing with LLM.
 - `--n_clusters` (optional, default 8): Number of clusters for Drain to organize log chunks into. This only makes sense when you are summarizing with Drain
 Example usage:
@@ -332,6 +332,9 @@ HTTPS certificate generated through:
 certbot certonly --standalone -d logdetective01.fedorainfracloud.org
 ```
+Certificates need to be be placed into location specified by the`LOGDETECTIVE_CERTDIR`
+env var and the service should be restarted.
 Querying statistics
 -------------------

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/constants.py RENAMED Viewed

@@ -26,17 +26,6 @@ Analysis:
 """
-SUMMARIZATION_PROMPT_TEMPLATE = """
-Does following log contain error or issue?
-Log:
-{}
-Answer:
-"""
 SNIPPET_PROMPT_TEMPLATE = """
 Analyse following RPM build log snippet. Describe contents accurately, without speculation or suggestions for resolution.

logdetective-1.5.0/logdetective/extractors.py ADDED Viewed

@@ -0,0 +1,42 @@
+import os
+import logging
+from typing import Tuple
+import drain3
+from drain3.template_miner_config import TemplateMinerConfig
+from logdetective.utils import get_chunks
+LOG = logging.getLogger("logdetective")
+class DrainExtractor:
+    """A class that extracts information from logs using a template miner algorithm."""
+    def __init__(self, verbose: bool = False, context: bool = False, max_clusters=8):
+        config = TemplateMinerConfig()
+        config.load(f"{os.path.dirname(__file__)}/drain3.ini")
+        config.profiling_enabled = verbose
+        config.drain_max_clusters = max_clusters
+        self.miner = drain3.TemplateMiner(config=config)
+        self.verbose = verbose
+        self.context = context
+    def __call__(self, log: str) -> list[Tuple[int, str]]:
+        out = []
+        # First pass create clusters
+        for _, chunk in get_chunks(log):
+            processed_chunk = self.miner.add_log_message(chunk)
+            LOG.debug(processed_chunk)
+        # Sort found clusters by size, descending order
+        sorted_clusters = sorted(
+            self.miner.drain.clusters, key=lambda it: it.size, reverse=True
+        )
+        # Second pass, only matching lines with clusters,
+        # to recover original text
+        for chunk_start, chunk in get_chunks(log):
+            cluster = self.miner.match(chunk, "always")
+            if cluster in sorted_clusters:
+                out.append((chunk_start, chunk))
+                sorted_clusters.remove(cluster)
+        return out

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/logdetective.py RENAMED Viewed

@@ -15,7 +15,7 @@ from logdetective.utils import (
     compute_certainty,
     load_prompts,
 )
-from logdetective.extractors import LLMExtractor, DrainExtractor
+from logdetective.extractors import DrainExtractor
 LOG = logging.getLogger("logdetective")
@@ -49,16 +49,16 @@ def setup_args():
         "--summarizer",
         type=str,
         default="drain",
-        help="Choose between LLM and Drain template miner as the log summarizer.\
-                                LLM must be specified as path to a model, URL or local file.",
+        help="DISABLED: LLM summarization option was removed. \
+                Argument is kept for backward compatibility only.",
     )
     parser.add_argument(
         "-N",
         "--n_lines",
         type=int,
-        default=8,
-        help="The number of lines per chunk for LLM analysis.\
-                            This only makes sense when you are summarizing with LLM.",
+        default=None,
+        help="DISABLED: LLM summarization option was removed. \
+                Argument is kept for backward compatibility only.",
     )
     parser.add_argument(
         "-C",
@@ -74,13 +74,13 @@ def setup_args():
         "--prompts",
         type=str,
         default=f"{os.path.dirname(__file__)}/prompts.yml",
-        help="Path to prompt configuration file."
+        help="Path to prompt configuration file.",
     )
     parser.add_argument(
         "--temperature",
         type=float,
         default=DEFAULT_TEMPERATURE,
-        help="Temperature for inference."
+        help="Temperature for inference.",
     )
     return parser.parse_args()
@@ -93,6 +93,10 @@ async def run():  # pylint: disable=too-many-statements,too-many-locals
         sys.stderr.write("Error: --quiet and --verbose is mutually exclusive.\n")
         sys.exit(2)
+    # Emit warning about use of discontinued args
+    if args.n_lines or args.summarizer != "drain":
+        LOG.warning("LLM based summarization was removed. Drain will be used instead.")
     # Logging facility setup
     log_level = logging.INFO
     if args.verbose >= 1:
@@ -116,18 +120,10 @@ async def run():  # pylint: disable=too-many-statements,too-many-locals
         LOG.error("You likely do not have enough memory to load the AI model")
         sys.exit(3)
-    # Log file summarizer selection and initialization
-    if args.summarizer == "drain":
-        extractor = DrainExtractor(
-            args.verbose > 1, context=True, max_clusters=args.n_clusters
-        )
-    else:
-        summarizer_model = initialize_model(args.summarizer, verbose=args.verbose > 2)
-        extractor = LLMExtractor(
-            summarizer_model,
-            args.verbose > 1,
-            prompts_configuration.summarization_prompt_template,
-        )
+    # Log file summarizer initialization
+    extractor = DrainExtractor(
+        args.verbose > 1, context=True, max_clusters=args.n_clusters
+    )
     LOG.info("Getting summary")
@@ -151,7 +147,8 @@ async def run():  # pylint: disable=too-many-statements,too-many-locals
     prompt = (
         f"{prompts_configuration.default_system_prompt}\n"
-        f"{prompts_configuration.prompt_template}")
+        f"{prompts_configuration.prompt_template}"
+    )
     stream = True
     if args.no_stream:
@@ -191,7 +188,7 @@ async def run():  # pylint: disable=too-many-statements,too-many-locals
 def main():
-    """ Evaluate logdetective program and wait for it to finish """
+    """Evaluate logdetective program and wait for it to finish"""
     asyncio.run(run())

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/models.py RENAMED Viewed

@@ -4,7 +4,6 @@ from pydantic import BaseModel
 from logdetective.constants import (
     PROMPT_TEMPLATE,
     PROMPT_TEMPLATE_STAGED,
-    SUMMARIZATION_PROMPT_TEMPLATE,
     SNIPPET_PROMPT_TEMPLATE,
     DEFAULT_SYSTEM_PROMPT,
 )
@@ -14,7 +13,6 @@ class PromptConfig(BaseModel):
     """Configuration for basic log detective prompts."""
     prompt_template: str = PROMPT_TEMPLATE
-    summarization_prompt_template: str = SUMMARIZATION_PROMPT_TEMPLATE
     snippet_prompt_template: str = SNIPPET_PROMPT_TEMPLATE
     prompt_template_staged: str = PROMPT_TEMPLATE_STAGED
@@ -27,9 +25,6 @@ class PromptConfig(BaseModel):
         if data is None:
             return
         self.prompt_template = data.get("prompt_template", PROMPT_TEMPLATE)
-        self.summarization_prompt_template = data.get(
-            "summarization_prompt_template", SUMMARIZATION_PROMPT_TEMPLATE
-        )
         self.snippet_prompt_template = data.get(
             "snippet_prompt_template", SNIPPET_PROMPT_TEMPLATE
         )

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/prompts.yml RENAMED Viewed

@@ -21,17 +21,6 @@ prompt_template: |
   Analysis:
-summarization_prompt_template: |
-  Does following log contain error or issue?
-  Log:
-  {}
-  Answer:
 snippet_prompt_template: |
   Analyse following RPM build log snippet. Describe contents accurately, without speculation or suggestions for resolution.

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/remote_log.py RENAMED Viewed

@@ -64,6 +64,4 @@ class RemoteLog:
         try:
             return await self.get_url_content()
         except RuntimeError as ex:
-            raise HTTPBadRequest(
-                reason=f"We couldn't obtain the logs: {ex}"
-            ) from ex
+            raise HTTPBadRequest(reason=f"We couldn't obtain the logs: {ex}") from ex

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/server/config.py RENAMED Viewed

@@ -52,11 +52,10 @@ def get_log(config: Config):
 def get_openai_api_client(ineference_config: InferenceConfig):
-    """Set up AsyncOpenAI client with default configuration.
-    """
+    """Set up AsyncOpenAI client with default configuration."""
     return AsyncOpenAI(
-        api_key=ineference_config.api_token,
-        base_url=ineference_config.url)
+        api_key=ineference_config.api_token, base_url=ineference_config.url
+    )
 SERVER_CONFIG_PATH = os.environ.get("LOGDETECTIVE_SERVER_CONF", None)

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/server/emoji.py RENAMED Viewed

@@ -51,7 +51,9 @@ async def _handle_gitlab_operation(func: Callable, *args):
         else:
             LOG.exception(log_msg)
     except Exception as e:  # pylint: disable=broad-exception-caught
-        LOG.exception("Unexpected error during GitLab operation %s(%s): %s", func, args, e)
+        LOG.exception(
+            "Unexpected error during GitLab operation %s(%s): %s", func, args, e
+        )
 async def collect_emojis_in_comments(  # pylint: disable=too-many-locals

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/server/plot.py RENAMED Viewed

@@ -340,7 +340,7 @@ def _plot_emoji_data(  # pylint: disable=too-many-locals
         )
         all_counts.extend(counts)
-    colors = [cm.viridis(i) for i in numpy.linspace(0, 1, len(reactions_values_dict))]    # pylint: disable=no-member
+    colors = [cm.viridis(i) for i in numpy.linspace(0, 1, len(reactions_values_dict))]  # pylint: disable=no-member
     first_emoji = True
     for i, (emoji, dict_counts) in enumerate(reactions_values_dict.items()):

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/utils.py RENAMED Viewed

@@ -179,7 +179,7 @@ def format_snippets(snippets: list[str] | list[Tuple[int, str]]) -> str:
             summary += f"""
             Snippet No. {i}:
-            {s[1]}
+            {s}
             ================
             """
     return summary
@@ -198,8 +198,11 @@ def load_prompts(path: str | None) -> PromptConfig:
 def prompt_to_messages(
-        user_message: str, system_prompt: str | None = None,
-        system_role: str = "developer", user_role: str = "user") -> List[Dict[str, str]]:
+    user_message: str,
+    system_prompt: str | None = None,
+    system_role: str = "developer",
+    user_role: str = "user",
+) -> List[Dict[str, str]]:
     """Turn prompt into list of message dictionaries.
     If `system_role` and `user_role` are the same, only a single message is created,
     as concatenation of `user_message` and `system_prompt`. This is useful for models which
@@ -208,22 +211,15 @@ def prompt_to_messages(
     if system_role == user_role:
         messages = [
-            {
-                "role": system_role,
-                "content": f"{system_prompt}\n{user_message}"
-            }
+            {"role": system_role, "content": f"{system_prompt}\n{user_message}"}
         ]
     else:
         messages = [
-            {
-                "role": system_role,
-                "content": system_prompt
-            },
+            {"role": system_role, "content": system_prompt},
             {
                 "role": user_role,
                 "content": user_message,
-            }
+            },
         ]
     return messages

{logdetective-1.4.0 → logdetective-1.5.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "logdetective"
-version = "1.4.0"
+version = "1.5.0"
 description = "Log using LLM AI to search for build/test failures and provide ideas for fixing these."
 authors = ["Jiri Podivin <jpodivin@gmail.com>"]
 license = "Apache-2.0"
@@ -15,7 +15,7 @@ packages = [
     { include = "logdetective" }
 ]
 classifiers = [
-    "Development Status :: 4 - Beta",
+    "Development Status :: 5 - Production/Stable",
     "Environment :: Console",
     "Intended Audience :: Developers",
     "License :: OSI Approved :: Apache Software License",

logdetective-1.4.0/logdetective/extractors.py DELETED Viewed

@@ -1,105 +0,0 @@
-import os
-import logging
-from typing import Tuple
-import drain3
-from drain3.template_miner_config import TemplateMinerConfig
-from llama_cpp import Llama, LlamaGrammar
-from logdetective.constants import SUMMARIZATION_PROMPT_TEMPLATE
-from logdetective.utils import get_chunks
-LOG = logging.getLogger("logdetective")
-class LLMExtractor:
-    """
-    A class that extracts relevant information from logs using a language model.
-    """
-    def __init__(
-        self,
-        model: Llama,
-        n_lines: int = 2,
-        prompt: str = SUMMARIZATION_PROMPT_TEMPLATE,
-    ):
-        self.model = model
-        self.n_lines = n_lines
-        self.grammar = LlamaGrammar.from_string(
-            'root ::= ("Yes" | "No")', verbose=False
-        )
-        self.prompt = prompt
-    def __call__(
-        self, log: str, n_lines: int = 2, neighbors: bool = False
-    ) -> list[str]:
-        chunks = self.rate_chunks(log)
-        out = self.create_extract(chunks, neighbors)
-        return out
-    def rate_chunks(self, log: str) -> list[tuple]:
-        """Scan log by the model and store results.
-        :param log: log file content
-        """
-        results = []
-        log_lines = log.split("\n")
-        for i in range(0, len(log_lines), self.n_lines):
-            block = "\n".join(log_lines[i: i + self.n_lines])
-            prompt = self.prompt.format(log)
-            out = self.model(prompt, max_tokens=7, grammar=self.grammar)
-            out = f"{out['choices'][0]['text']}\n"
-            results.append((block, out))
-        return results
-    def create_extract(self, chunks: list[tuple], neighbors: bool = False) -> list[str]:
-        """Extract interesting chunks from the model processing."""
-        interesting = []
-        summary = []
-        # pylint: disable=consider-using-enumerate
-        for i in range(len(chunks)):
-            if chunks[i][1].startswith("Yes"):
-                interesting.append(i)
-                if neighbors:
-                    interesting.extend([max(i - 1, 0), min(i + 1, len(chunks) - 1)])
-        interesting = set(interesting)
-        for i in interesting:
-            summary.append(chunks[i][0])
-        return summary
-class DrainExtractor:
-    """A class that extracts information from logs using a template miner algorithm."""
-    def __init__(self, verbose: bool = False, context: bool = False, max_clusters=8):
-        config = TemplateMinerConfig()
-        config.load(f"{os.path.dirname(__file__)}/drain3.ini")
-        config.profiling_enabled = verbose
-        config.drain_max_clusters = max_clusters
-        self.miner = drain3.TemplateMiner(config=config)
-        self.verbose = verbose
-        self.context = context
-    def __call__(self, log: str) -> list[Tuple[int, str]]:
-        out = []
-        # First pass create clusters
-        for _, chunk in get_chunks(log):
-            processed_chunk = self.miner.add_log_message(chunk)
-            LOG.debug(processed_chunk)
-        # Sort found clusters by size, descending order
-        sorted_clusters = sorted(
-            self.miner.drain.clusters, key=lambda it: it.size, reverse=True
-        )
-        # Second pass, only matching lines with clusters,
-        # to recover original text
-        for chunk_start, chunk in get_chunks(log):
-            cluster = self.miner.match(chunk, "always")
-            if cluster in sorted_clusters:
-                out.append((chunk_start, chunk))
-                sorted_clusters.remove(cluster)
-        return out

{logdetective-1.4.0 → logdetective-1.5.0}/LICENSE RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/__init__.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/drain3.ini RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/prompts-summary-first.yml RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/prompts-summary-only.yml RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/server/__init__.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/server/compressors.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/server/database/__init__.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/server/database/base.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/server/database/models/__init__.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/server/database/models/merge_request_jobs.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/server/database/models/metrics.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/server/gitlab.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/server/llm.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/server/metric.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/server/models.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/server/server.py RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/server/templates/gitlab_full_comment.md.j2 RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective/server/templates/gitlab_short_comment.md.j2 RENAMED Viewed

File without changes

{logdetective-1.4.0 → logdetective-1.5.0}/logdetective.1.asciidoc RENAMED Viewed

File without changes

logdetective 1.4.0__tar.gz → 1.5.0__tar.gz

logdetective 1.4.0tar.gz → 1.5.0tar.gz