PyPI - logdetective - Versions diffs - 0.5.10__tar.gz → 0.5.11__tar.gz - Mend

logdetective 0.5.10tar.gz → 0.5.11tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

{logdetective-0.5.10 → logdetective-0.5.11}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.3
 Name: logdetective
-Version: 0.5.10
+Version: 0.5.11
 Summary: Log using LLM AI to search for build/test failures and provide ideas for fixing these.
 License: Apache-2.0
 Author: Jiri Podivin
@@ -47,6 +47,8 @@ Log Detective
 A Python tool to analyze logs using a Language Model (LLM) and Drain template miner.
+Note: if you are looking for code of website logdetective.com it is in [github.com/fedora-copr/logdetective-website](https://github.com/fedora-copr/logdetective-website).
 Installation
 ------------
@@ -95,6 +97,17 @@ Example you want to use a different model:
     logdetective https://example.com/logs.txt --model https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q5_K_S.gguf?download=true
     logdetective https://example.com/logs.txt --model QuantFactory/Meta-Llama-3-8B-Instruct-GGUF
+Example of different suffix (useful for models that were quantized)
+    logdetective https://kojipkgs.fedoraproject.org//work/tasks/3367/131313367/build.log --model 'fedora-copr/granite-3.2-8b-instruct-GGUF' -F Q4_K.gguf
+Example of altered prompts:
+     cp ~/.local/lib/python3.13/site-packages/logdetective/prompts.yml ~/my-prompts.yml
+     vi ~/my-prompts.yml # edit the prompts there to better fit your needs
+     logdetective https://kojipkgs.fedoraproject.org//work/tasks/3367/131313367/build.log --prompts ~/my-prompts.yml
 Note that streaming with some models (notably Meta-Llama-3 is broken) is broken and can be workarounded by `no-stream` option:
     logdetective https://example.com/logs.txt --model QuantFactory/Meta-Llama-3-8B-Instruct-GGUF --no-stream
@@ -337,11 +350,23 @@ certbot certonly --standalone -d logdetective01.fedorainfracloud.org
 Querying statistics
 -------------------
-You can retrieve statistics about server requests over a specified time period
-using either the `curl` command or the `http` command (provided by the `httpie` package).
+You can retrieve statistics about server requests and responses over a specified time period
+using either a browser, the `curl` or the `http` command (provided by the `httpie` package).
 When no time period is specified, the query defaults to the last 2 days:
+You can view requests and responses statistics
+ - for the `/analyze` endpoint at http://localhost:8080/metrics/analyze
+ - for the `/analyze/staged` endpoint at http://localhost:8080/metrics/analyze/staged.
+You can retrieve single svg images at the following endpoints:
+ - `/metrics/analyze/requests`
+ - `/metrics/analyze/responses`
+ - `/metrics/analyze/staged/requests`
+ - `/metrics/analyze/stages/responses`
+Examples:
 ```
 http GET "localhost:8080/metrics/analyze/requests" > /tmp/plot.svg
 curl "localhost:8080/metrics/analyze/staged/requests" > /tmp/plot.svg
@@ -349,7 +374,6 @@ curl "localhost:8080/metrics/analyze/staged/requests" > /tmp/plot.svg
 You can specify the time period in hours, days, or weeks.
 The time period:
  - cannot be less than one hour
  - cannot be negative
  - ends at the current time (when the query is made)

{logdetective-0.5.10 → logdetective-0.5.11}/README.md RENAMED Viewed

@@ -7,6 +7,8 @@ Log Detective
 A Python tool to analyze logs using a Language Model (LLM) and Drain template miner.
+Note: if you are looking for code of website logdetective.com it is in [github.com/fedora-copr/logdetective-website](https://github.com/fedora-copr/logdetective-website).
 Installation
 ------------
@@ -55,6 +57,17 @@ Example you want to use a different model:
     logdetective https://example.com/logs.txt --model https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q5_K_S.gguf?download=true
     logdetective https://example.com/logs.txt --model QuantFactory/Meta-Llama-3-8B-Instruct-GGUF
+Example of different suffix (useful for models that were quantized)
+    logdetective https://kojipkgs.fedoraproject.org//work/tasks/3367/131313367/build.log --model 'fedora-copr/granite-3.2-8b-instruct-GGUF' -F Q4_K.gguf
+Example of altered prompts:
+     cp ~/.local/lib/python3.13/site-packages/logdetective/prompts.yml ~/my-prompts.yml
+     vi ~/my-prompts.yml # edit the prompts there to better fit your needs
+     logdetective https://kojipkgs.fedoraproject.org//work/tasks/3367/131313367/build.log --prompts ~/my-prompts.yml
 Note that streaming with some models (notably Meta-Llama-3 is broken) is broken and can be workarounded by `no-stream` option:
     logdetective https://example.com/logs.txt --model QuantFactory/Meta-Llama-3-8B-Instruct-GGUF --no-stream
@@ -297,11 +310,23 @@ certbot certonly --standalone -d logdetective01.fedorainfracloud.org
 Querying statistics
 -------------------
-You can retrieve statistics about server requests over a specified time period
-using either the `curl` command or the `http` command (provided by the `httpie` package).
+You can retrieve statistics about server requests and responses over a specified time period
+using either a browser, the `curl` or the `http` command (provided by the `httpie` package).
 When no time period is specified, the query defaults to the last 2 days:
+You can view requests and responses statistics
+ - for the `/analyze` endpoint at http://localhost:8080/metrics/analyze
+ - for the `/analyze/staged` endpoint at http://localhost:8080/metrics/analyze/staged.
+You can retrieve single svg images at the following endpoints:
+ - `/metrics/analyze/requests`
+ - `/metrics/analyze/responses`
+ - `/metrics/analyze/staged/requests`
+ - `/metrics/analyze/stages/responses`
+Examples:
 ```
 http GET "localhost:8080/metrics/analyze/requests" > /tmp/plot.svg
 curl "localhost:8080/metrics/analyze/staged/requests" > /tmp/plot.svg
@@ -309,7 +334,6 @@ curl "localhost:8080/metrics/analyze/staged/requests" > /tmp/plot.svg
 You can specify the time period in hours, days, or weeks.
 The time period:
  - cannot be less than one hour
  - cannot be negative
  - ends at the current time (when the query is made)

{logdetective-0.5.10 → logdetective-0.5.11}/logdetective/constants.py RENAMED Viewed

@@ -16,6 +16,8 @@ Snippets are delimited with '================'.
 Finally, drawing on information from all snippets, provide complete explanation of the issue and recommend solution.
+Explanation of the issue, and recommended solution, should take handful of sentences.
 Snippets:
 {}
@@ -38,6 +40,8 @@ Answer:
 SNIPPET_PROMPT_TEMPLATE = """
 Analyse following RPM build log snippet. Describe contents accurately, without speculation or suggestions for resolution.
+Your analysis must be as concise as possible, while keeping relevant information intact.
 Snippet:
 {}
@@ -55,6 +59,8 @@ Snippets are delimited with '================'.
 Drawing on information from all snippets, provide complete explanation of the issue and recommend solution.
+Explanation of the issue, and recommended solution, should take handful of sentences.
 Snippets:
 {}
@@ -64,3 +70,5 @@ Analysis:
 """
 SNIPPET_DELIMITER = "================"
+DEFAULT_TEMPERATURE = 0.8

{logdetective-0.5.10 → logdetective-0.5.11}/logdetective/logdetective.py RENAMED Viewed

@@ -3,7 +3,7 @@ import logging
 import sys
 import os
-from logdetective.constants import DEFAULT_ADVISOR
+from logdetective.constants import DEFAULT_ADVISOR, DEFAULT_TEMPERATURE
 from logdetective.utils import (
     process_log,
     initialize_model,
@@ -73,6 +73,12 @@ def setup_args():
         default=f"{os.path.dirname(__file__)}/prompts.yml",
         help="Path to prompt configuration file."
     )
+    parser.add_argument(
+        "--temperature",
+        type=float,
+        default=DEFAULT_TEMPERATURE,
+        help="Temperature for inference."
+    )
     return parser.parse_args()
@@ -147,6 +153,7 @@ def main():  # pylint: disable=too-many-statements,too-many-locals
         model,
         stream,
         prompt_template=prompts_configuration.prompt_template,
+        temperature=args.temperature,
     )
     probs = []
     print("Explanation:")

{logdetective-0.5.10 → logdetective-0.5.11}/logdetective/prompts.yml RENAMED Viewed

@@ -13,6 +13,8 @@ prompt_template: |
   Finally, drawing on information from all snippets, provide complete explanation of the issue and recommend solution.
+  Explanation of the issue, and recommended solution, should take handful of sentences.
   Snippets:
   {}
@@ -33,6 +35,8 @@ summarization_prompt_template: |
 snippet_prompt_template: |
   Analyse following RPM build log snippet. Describe contents accurately, without speculation or suggestions for resolution.
+  Your analysis must be as concise as possible, while keeping relevant information intact.
   Snippet:
   {}
@@ -48,6 +52,8 @@ prompt_template_staged: |
   Drawing on information from all snippets, provide complete explanation of the issue and recommend solution.
+  Explanation of the issue, and recommended solution, should take handful of sentences.
   Snippets:
   {}

logdetective-0.5.11/logdetective/server/database/models.py ADDED Viewed

@@ -0,0 +1,390 @@
+import enum
+import datetime
+from typing import Optional
+from sqlalchemy import (
+    Column,
+    Integer,
+    Float,
+    DateTime,
+    String,
+    Enum,
+    func,
+    select,
+    distinct,
+)
+from logdetective.server.database.base import Base, transaction
+class EndpointType(enum.Enum):
+    """Different analyze endpoints"""
+    ANALYZE = "analyze_log"
+    ANALYZE_STAGED = "analyze_log_staged"
+    ANALYZE_STREAM = "analyze_log_stream"
+class AnalyzeRequestMetrics(Base):
+    """Store data related to received requests and given responses"""
+    __tablename__ = "analyze_request_metrics"
+    id = Column(Integer, primary_key=True)
+    endpoint = Column(
+        Enum(EndpointType),
+        nullable=False,
+        index=True,
+        comment="The service endpoint that was called",
+    )
+    request_received_at = Column(
+        DateTime,
+        nullable=False,
+        index=True,
+        default=datetime.datetime.now(datetime.timezone.utc),
+        comment="Timestamp when the request was received",
+    )
+    log_url = Column(
+        String,
+        nullable=False,
+        index=False,
+        comment="Log url for which analysis was requested",
+    )
+    response_sent_at = Column(
+        DateTime, nullable=True, comment="Timestamp when the response was sent back"
+    )
+    response_length = Column(
+        Integer, nullable=True, comment="Length of the response in chars"
+    )
+    response_certainty = Column(
+        Float, nullable=True, comment="Certainty for generated response"
+    )
+    @classmethod
+    def create(
+        cls,
+        endpoint: EndpointType,
+        log_url: str,
+        request_received_at: Optional[datetime.datetime] = None,
+    ) -> int:
+        """Create AnalyzeRequestMetrics new line
+        with data related to a received request"""
+        with transaction(commit=True) as session:
+            metrics = AnalyzeRequestMetrics()
+            metrics.endpoint = endpoint
+            metrics.request_received_at = request_received_at or datetime.datetime.now(
+                datetime.timezone.utc
+            )
+            metrics.log_url = log_url
+            session.add(metrics)
+            session.flush()
+            return metrics.id
+    @classmethod
+    def update(
+        cls,
+        id_: int,
+        response_sent_at: datetime,
+        response_length: int,
+        response_certainty: float,
+    ) -> None:
+        """Update an AnalyzeRequestMetrics line
+        with data related to the given response"""
+        with transaction(commit=True) as session:
+            metrics = session.query(AnalyzeRequestMetrics).filter_by(id=id_).first()
+            metrics.response_sent_at = response_sent_at
+            metrics.response_length = response_length
+            metrics.response_certainty = response_certainty
+            session.add(metrics)
+    @classmethod
+    def get_postgres_time_format(cls, time_format):
+        """Map python time format in the PostgreSQL format."""
+        if time_format == "%Y-%m-%d":
+            pgsql_time_format = "YYYY-MM-DD"
+        else:
+            pgsql_time_format = "YYYY-MM-DD HH24"
+        return pgsql_time_format
+    @classmethod
+    def get_dictionary_with_datetime_keys(
+        cls, time_format: str, values_dict: dict[str, any]
+    ) -> dict[datetime.datetime, any]:
+        """Convert from a dictionary with str keys to a dictionary with datetime keys"""
+        new_dict = {
+            datetime.datetime.strptime(r[0], time_format): r[1] for r in values_dict
+        }
+        return new_dict
+    @classmethod
+    def _get_requests_by_time_for_postgres(
+        cls, start_time, end_time, time_format, endpoint
+    ):
+        """Get total requests number in time period.
+        func.to_char is PostgreSQL specific.
+        Let's unit tests replace this function with the SQLite version.
+        """
+        pgsql_time_format = cls.get_postgres_time_format(time_format)
+        requests_by_time_format = (
+            select(
+                cls.id,
+                func.to_char(cls.request_received_at, pgsql_time_format).label(
+                    "time_format"
+                ),
+            )
+            .filter(cls.request_received_at.between(start_time, end_time))
+            .filter(cls.endpoint == endpoint)
+            .cte("requests_by_time_format")
+        )
+        return requests_by_time_format
+    @classmethod
+    def _get_requests_by_time_for_sqlite(
+        cls, start_time, end_time, time_format, endpoint
+    ):
+        """Get total requests number in time period.
+        func.strftime is SQLite specific.
+        Use this function in unit test using flexmock:
+        flexmock(AnalyzeRequestMetrics).should_receive("_get_requests_by_time_for_postgres")
+        .replace_with(AnalyzeRequestMetrics._get_requests_by_time_for_sqllite)
+        """
+        requests_by_time_format = (
+            select(
+                cls.id,
+                func.strftime(time_format, cls.request_received_at).label(
+                    "time_format"
+                ),
+            )
+            .filter(cls.request_received_at.between(start_time, end_time))
+            .filter(cls.endpoint == endpoint)
+            .cte("requests_by_time_format")
+        )
+        return requests_by_time_format
+    @classmethod
+    def get_requests_in_period(
+        cls,
+        start_time: datetime.datetime,
+        end_time: datetime.datetime,
+        time_format: str,
+        endpoint: Optional[EndpointType] = EndpointType.ANALYZE,
+    ) -> dict[datetime.datetime, int]:
+        """
+        Get a dictionary with request counts grouped by time units within a specified period.
+        Args:
+            start_time (datetime): The start of the time period to query
+            end_time (datetime): The end of the time period to query
+            time_format (str): The strftime format string to format timestamps (e.g., '%Y-%m-%d')
+            endpoint (EndpointType): The analyze API endpoint to query
+        Returns:
+            dict[datetime, int]: A dictionary mapping datetime objects to request counts
+        """
+        with transaction(commit=False) as session:
+            requests_by_time_format = cls._get_requests_by_time_for_postgres(
+                start_time, end_time, time_format, endpoint
+            )
+            count_requests_by_time_format = select(
+                requests_by_time_format.c.time_format,
+                func.count(distinct(requests_by_time_format.c.id)),  # pylint: disable=not-callable
+            ).group_by("time_format")
+            counts = session.execute(count_requests_by_time_format)
+            results = counts.fetchall()
+            return cls.get_dictionary_with_datetime_keys(time_format, results)
+    @classmethod
+    def _get_average_responses_times_for_postgres(
+        cls, start_time, end_time, time_format, endpoint
+    ):
+        """Get average responses time.
+        func.to_char is PostgreSQL specific.
+        Let's unit tests replace this function with the SQLite version.
+        """
+        with transaction(commit=False) as session:
+            pgsql_time_format = cls.get_postgres_time_format(time_format)
+            average_responses_times = (
+                select(
+                    func.to_char(cls.request_received_at, pgsql_time_format).label(
+                        "time_range"
+                    ),
+                    (
+                        func.avg(
+                            func.extract(  # pylint: disable=not-callable
+                                "epoch", cls.response_sent_at - cls.request_received_at
+                            )
+                        )
+                    ).label("average_response_seconds"),
+                )
+                .filter(cls.request_received_at.between(start_time, end_time))
+                .filter(cls.endpoint == endpoint)
+                .group_by("time_range")
+                .order_by("time_range")
+            )
+            results = session.execute(average_responses_times).fetchall()
+            return results
+    @classmethod
+    def _get_average_responses_times_for_sqlite(
+        cls, start_time, end_time, time_format, endpoint
+    ):
+        """Get average responses time.
+        func.strftime is SQLite specific.
+        Use this function in unit test using flexmock:
+        flexmock(AnalyzeRequestMetrics).should_receive("_get_average_responses_times_for_postgres")
+        .replace_with(AnalyzeRequestMetrics._get_average_responses_times_for_sqlite)
+        """
+        with transaction(commit=False) as session:
+            average_responses_times = (
+                select(
+                    func.strftime(time_format, cls.request_received_at).label(
+                        "time_range"
+                    ),
+                    (
+                        func.avg(
+                            func.julianday(cls.response_sent_at)
+                            - func.julianday(cls.request_received_at)  # noqa: W503 flake8 vs ruff
+                        )
+                        * 86400  # noqa: W503 flake8 vs ruff
+                    ).label("average_response_seconds"),
+                )
+                .filter(cls.request_received_at.between(start_time, end_time))
+                .filter(cls.endpoint == endpoint)
+                .group_by("time_range")
+                .order_by("time_range")
+            )
+            results = session.execute(average_responses_times).fetchall()
+            return results
+    @classmethod
+    def get_responses_average_time_in_period(
+        cls,
+        start_time: datetime.datetime,
+        end_time: datetime.datetime,
+        time_format: str,
+        endpoint: Optional[EndpointType] = EndpointType.ANALYZE,
+    ) -> dict[datetime.datetime, int]:
+        """
+        Get a dictionary with average responses times
+        grouped by time units within a specified period.
+        Args:
+            start_time (datetime): The start of the time period to query
+            end_time (datetime): The end of the time period to query
+            time_format (str): The strftime format string to format timestamps (e.g., '%Y-%m-%d')
+            endpoint (EndpointType): The analyze API endpoint to query
+        Returns:
+            dict[datetime, int]: A dictionary mapping datetime objects
+            to average responses times
+        """
+        with transaction(commit=False) as _:
+            average_responses_times = cls._get_average_responses_times_for_postgres(
+                start_time, end_time, time_format, endpoint
+            )
+            return cls.get_dictionary_with_datetime_keys(
+                time_format, average_responses_times
+            )
+    @classmethod
+    def _get_average_responses_lengths_for_postgres(
+        cls, start_time, end_time, time_format, endpoint
+    ):
+        """Get average responses length.
+        func.to_char is PostgreSQL specific.
+        Let's unit tests replace this function with the SQLite version.
+        """
+        with transaction(commit=False) as session:
+            pgsql_time_format = cls.get_postgres_time_format(time_format)
+            average_responses_lengths = (
+                select(
+                    func.to_char(cls.request_received_at, pgsql_time_format).label(
+                        "time_range"
+                    ),
+                    (func.avg(cls.response_length)).label("average_responses_length"),
+                )
+                .filter(cls.request_received_at.between(start_time, end_time))
+                .filter(cls.endpoint == endpoint)
+                .group_by("time_range")
+                .order_by("time_range")
+            )
+            results = session.execute(average_responses_lengths).fetchall()
+            return results
+    @classmethod
+    def _get_average_responses_lengths_for_sqlite(
+        cls, start_time, end_time, time_format, endpoint
+    ):
+        """Get average responses length.
+        func.strftime is SQLite specific.
+        Use this function in unit test using flexmock:
+        flexmock(AnalyzeRequestMetrics)
+        .should_receive("_get_average_responses_lengths_for_postgres")
+        .replace_with(AnalyzeRequestMetrics._get_average_responses_lengths_for_sqlite)
+        """
+        with transaction(commit=False) as session:
+            average_responses_lengths = (
+                select(
+                    func.strftime(time_format, cls.request_received_at).label(
+                        "time_range"
+                    ),
+                    (func.avg(cls.response_length)).label("average_responses_length"),
+                )
+                .filter(cls.request_received_at.between(start_time, end_time))
+                .filter(cls.endpoint == endpoint)
+                .group_by("time_range")
+                .order_by("time_range")
+            )
+            results = session.execute(average_responses_lengths).fetchall()
+            return results
+    @classmethod
+    def get_responses_average_length_in_period(
+        cls,
+        start_time: datetime.datetime,
+        end_time: datetime.datetime,
+        time_format: str,
+        endpoint: Optional[EndpointType] = EndpointType.ANALYZE,
+    ) -> dict[datetime.datetime, int]:
+        """
+        Get a dictionary with average responses length
+        grouped by time units within a specified period.
+        Args:
+            start_time (datetime): The start of the time period to query
+            end_time (datetime): The end of the time period to query
+            time_format (str): The strftime format string to format timestamps (e.g., '%Y-%m-%d')
+            endpoint (EndpointType): The analyze API endpoint to query
+        Returns:
+            dict[datetime, int]: A dictionary mapping datetime objects
+            to average responses lengths
+        """
+        with transaction(commit=False) as _:
+            average_responses_lengths = cls._get_average_responses_lengths_for_postgres(
+                start_time, end_time, time_format, endpoint
+            )
+            return cls.get_dictionary_with_datetime_keys(
+                time_format, average_responses_lengths
+            )

{logdetective-0.5.10 → logdetective-0.5.11}/logdetective/server/metric.py RENAMED Viewed

@@ -41,12 +41,10 @@ def update_metrics(
         sent_at if sent_at else datetime.datetime.now(datetime.timezone.utc)
     )
     response_length = None
-    if hasattr(response, "explanation") and "choices" in response.explanation:
-        response_length = sum(
-            len(choice["text"])
-            for choice in response.explanation["choices"]
-            if "text" in choice
-        )
+    if hasattr(response, "explanation") and isinstance(
+        response.explanation, models.Explanation
+    ):
+        response_length = len(response.explanation.text)
     response_certainty = (
         response.response_certainty if hasattr(response, "response_certainty") else None
     )

{logdetective-0.5.10 → logdetective-0.5.11}/logdetective/server/models.py RENAMED Viewed

@@ -2,7 +2,9 @@ import datetime
 from logging import BASIC_FORMAT
 from typing import List, Dict, Optional, Literal
-from pydantic import BaseModel, Field, model_validator, field_validator
+from pydantic import BaseModel, Field, model_validator, field_validator, NonNegativeFloat
+from logdetective.constants import DEFAULT_TEMPERATURE
 class BuildLog(BaseModel):
@@ -95,6 +97,8 @@ class InferenceConfig(BaseModel):
     )
     url: str = ""
     api_token: str = ""
+    model: str = ""
+    temperature: NonNegativeFloat = DEFAULT_TEMPERATURE
     def __init__(self, data: Optional[dict] = None):
         super().__init__()
@@ -106,6 +110,8 @@ class InferenceConfig(BaseModel):
         self.api_endpoint = data.get("api_endpoint", "/chat/completions")
         self.url = data.get("url", "")
         self.api_token = data.get("api_token", "")
+        self.model = data.get("model", "default-model")
+        self.temperature = data.get("temperature", DEFAULT_TEMPERATURE)
 class ExtractorConfig(BaseModel):
@@ -150,7 +156,8 @@ class LogConfig(BaseModel):
     """Logging configuration"""
     name: str = "logdetective"
-    level: str | int = "INFO"
+    level_stream: str | int = "INFO"
+    level_file: str | int = "INFO"
     path: str | None = None
     format: str = BASIC_FORMAT
@@ -160,7 +167,8 @@ class LogConfig(BaseModel):
             return
         self.name = data.get("name", "logdetective")
-        self.level = data.get("level", "INFO").upper()
+        self.level_stream = data.get("level_stream", "INFO").upper()
+        self.level_file = data.get("level_file", "INFO").upper()
         self.path = data.get("path")
         self.format = data.get("format", BASIC_FORMAT)

logdetective 0.5.10__tar.gz → 0.5.11__tar.gz

logdetective 0.5.10tar.gz → 0.5.11tar.gz