PyPI - evalscope - Versions diffs - 0.6.0rc0__tar.gz → 0.6.1__tar.gz - Mend

evalscope 0.6.0rc0tar.gz → 0.6.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of evalscope might be problematic. Click here for more details.

Files changed (222) hide show

{evalscope-0.6.0rc0 → evalscope-0.6.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: evalscope
-Version: 0.6.0rc0
+Version: 0.6.1
 Summary: EvalScope: Lightweight LLMs Evaluation Framework
 Home-page: https://github.com/modelscope/evalscope
 Author: ModelScope team
@@ -53,7 +53,7 @@ Provides-Extra: vlmeval
 Requires-Dist: ms-vlmeval>=0.0.5; extra == "vlmeval"
 Provides-Extra: rag
 Requires-Dist: mteb==1.19.4; extra == "rag"
-Requires-Dist: ragas==0.2.3; extra == "rag"
+Requires-Dist: ragas==0.2.5; extra == "rag"
 Requires-Dist: webdataset>0.2.0; extra == "rag"
 Provides-Extra: inner
 Requires-Dist: absl-py; extra == "inner"
@@ -118,7 +118,7 @@ Requires-Dist: rouge-chinese; extra == "all"
 Requires-Dist: ms-opencompass>=0.1.3; extra == "all"
 Requires-Dist: ms-vlmeval>=0.0.5; extra == "all"
 Requires-Dist: mteb==1.19.4; extra == "all"
-Requires-Dist: ragas==0.2.3; extra == "all"
+Requires-Dist: ragas==0.2.5; extra == "all"
 Requires-Dist: webdataset>0.2.0; extra == "all"
@@ -140,6 +140,7 @@ Requires-Dist: webdataset>0.2.0; extra == "all"
  <a href="https://evalscope.readthedocs.io/en/latest/">📖 Documents</a>
 <p>
+> ⭐ If you like this project, please click the "Star" button at the top right to support us. Your support is our motivation to keep going!
 ## 📋 Table of Contents
 - [Introduction](#introduction)
@@ -165,7 +166,7 @@ EvalScope is the official model evaluation and performance benchmarking framewor
 The architecture includes the following modules:
 1. **Model Adapter**: The model adapter is used to convert the outputs of specific models into the format required by the framework, supporting both API call models and locally run models.
 2. **Data Adapter**: The data adapter is responsible for converting and processing input data to meet various evaluation needs and formats.
-3. **Evaluation Backend**:
+3. **Evaluation Backend**:
     - **Native**: EvalScope’s own **default evaluation framework**, supporting various evaluation modes, including single model evaluation, arena mode, baseline model comparison mode, etc.
     - **OpenCompass**: Supports [OpenCompass](https://github.com/open-compass/opencompass) as the evaluation backend, providing advanced encapsulation and task simplification, allowing you to submit tasks for evaluation more easily.
     - **VLMEvalKit**: Supports [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) as the evaluation backend, enabling easy initiation of multi-modal evaluation tasks, supporting various multi-modal models and datasets.
@@ -252,7 +253,7 @@ You can execute this command from any directory:
 python -m evalscope.run \
  --model qwen/Qwen2-0.5B-Instruct \
  --template-type qwen \
- --datasets arc
+ --datasets arc
 ```
 #### Install from source
@@ -359,13 +360,13 @@ EvalScope supports using third-party evaluation frameworks to initiate evaluatio
 EvalScope supports custom dataset evaluation. For detailed information, please refer to the Custom Dataset Evaluation [📖User Guide](https://evalscope.readthedocs.io/en/latest/advanced_guides/custom_dataset.html)
 ## Offline Evaluation
-You can use local dataset to evaluate the model without internet connection.
+You can use local dataset to evaluate the model without internet connection.
 Refer to: Offline Evaluation [📖 User Guide](https://evalscope.readthedocs.io/en/latest/user_guides/offline_evaluation.html)
 ## Arena Mode
-The Arena mode allows multiple candidate models to be evaluated through pairwise battles, and can choose to use the AI Enhanced Auto-Reviewer (AAR) automatic evaluation process or manual evaluation to obtain the evaluation report.
+The Arena mode allows multiple candidate models to be evaluated through pairwise battles, and can choose to use the AI Enhanced Auto-Reviewer (AAR) automatic evaluation process or manual evaluation to obtain the evaluation report.
 Refer to: Arena Mode [📖 User Guide](https://evalscope.readthedocs.io/en/latest/user_guides/arena.html)

{evalscope-0.6.0rc0 → evalscope-0.6.1}/README.md RENAMED Viewed

@@ -17,6 +17,7 @@
  <a href="https://evalscope.readthedocs.io/en/latest/">📖 Documents</a>
 <p>
+> ⭐ If you like this project, please click the "Star" button at the top right to support us. Your support is our motivation to keep going!
 ## 📋 Table of Contents
 - [Introduction](#introduction)
@@ -42,7 +43,7 @@ EvalScope is the official model evaluation and performance benchmarking framewor
 The architecture includes the following modules:
 1. **Model Adapter**: The model adapter is used to convert the outputs of specific models into the format required by the framework, supporting both API call models and locally run models.
 2. **Data Adapter**: The data adapter is responsible for converting and processing input data to meet various evaluation needs and formats.
-3. **Evaluation Backend**:
+3. **Evaluation Backend**:
     - **Native**: EvalScope’s own **default evaluation framework**, supporting various evaluation modes, including single model evaluation, arena mode, baseline model comparison mode, etc.
     - **OpenCompass**: Supports [OpenCompass](https://github.com/open-compass/opencompass) as the evaluation backend, providing advanced encapsulation and task simplification, allowing you to submit tasks for evaluation more easily.
     - **VLMEvalKit**: Supports [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) as the evaluation backend, enabling easy initiation of multi-modal evaluation tasks, supporting various multi-modal models and datasets.
@@ -129,7 +130,7 @@ You can execute this command from any directory:
 python -m evalscope.run \
  --model qwen/Qwen2-0.5B-Instruct \
  --template-type qwen \
- --datasets arc
+ --datasets arc
 ```
 #### Install from source
@@ -236,13 +237,13 @@ EvalScope supports using third-party evaluation frameworks to initiate evaluatio
 EvalScope supports custom dataset evaluation. For detailed information, please refer to the Custom Dataset Evaluation [📖User Guide](https://evalscope.readthedocs.io/en/latest/advanced_guides/custom_dataset.html)
 ## Offline Evaluation
-You can use local dataset to evaluate the model without internet connection.
+You can use local dataset to evaluate the model without internet connection.
 Refer to: Offline Evaluation [📖 User Guide](https://evalscope.readthedocs.io/en/latest/user_guides/offline_evaluation.html)
 ## Arena Mode
-The Arena mode allows multiple candidate models to be evaluated through pairwise battles, and can choose to use the AI Enhanced Auto-Reviewer (AAR) automatic evaluation process or manual evaluation to obtain the evaluation report.
+The Arena mode allows multiple candidate models to be evaluated through pairwise battles, and can choose to use the AI Enhanced Auto-Reviewer (AAR) automatic evaluation process or manual evaluation to obtain the evaluation report.
 Refer to: Arena Mode [📖 User Guide](https://evalscope.readthedocs.io/en/latest/user_guides/arena.html)
@@ -270,4 +271,4 @@ Refer to : Model Serving Performance Evaluation [📖 User Guide](https://evalsc
 ## Star History
-[![Star History Chart](https://api.star-history.com/svg?repos=modelscope/evalscope&type=Date)](https://star-history.com/#modelscope/evalscope&Date)
+[![Star History Chart](https://api.star-history.com/svg?repos=modelscope/evalscope&type=Date)](https://star-history.com/#modelscope/evalscope&Date)

{evalscope-0.6.0rc0 → evalscope-0.6.1}/evalscope/backend/opencompass/tasks/eval_datasets.py RENAMED Viewed

@@ -51,12 +51,12 @@ with read_base():
     from opencompass.configs.datasets.triviaqa.triviaqa_gen_2121ce import triviaqa_datasets
     from opencompass.configs.datasets.cmb.cmb_gen_dfb5c4 import cmb_datasets
     from opencompass.configs.datasets.cmmlu.cmmlu_gen_c13365 import cmmlu_datasets
-    from opencompass.configs.datasets.bbh.bbh_gen_5b92b0 import bbh_datasets
     # Note: to be supported
     # from opencompass.configs.datasets.flores.flores_gen_806ede import flores_datasets
     # from opencompass.configs.datasets.TheoremQA.TheoremQA_5shot_gen_6f0af8 import TheoremQA_datasets
     # from opencompass.configs.datasets.commonsenseqa.commonsenseqa_gen_c946f2 import commonsenseqa_datasets
+    # from opencompass.configs.datasets.bbh.bbh_gen_5b92b0 import bbh_datasets
 datasets = []

{evalscope-0.6.0rc0 → evalscope-0.6.1}/evalscope/backend/rag_eval/ragas/tasks/testset_generation.py RENAMED Viewed

@@ -1,15 +1,15 @@
-import os
 import asyncio
+import os
 import pandas as pd
-from tqdm import tqdm
-from ragas.llms import LangchainLLMWrapper
 from ragas.embeddings import LangchainEmbeddingsWrapper
-from .translate_prompt import translate_prompts
-from evalscope.utils.logger import get_logger
-from evalscope.backend.rag_eval.ragas.arguments import TestsetGenerationArguments
-from evalscope.backend.rag_eval import EmbeddingModel, LLM, ChatOpenAI
+from ragas.llms import LangchainLLMWrapper
+from tqdm import tqdm
-os.environ['DO_NOT_TRACK'] = 'true'
+from evalscope.backend.rag_eval import LLM, ChatOpenAI, EmbeddingModel
+from evalscope.backend.rag_eval.ragas.arguments import TestsetGenerationArguments
+from evalscope.utils.logger import get_logger
+from .translate_prompt import translate_prompts
 logger = get_logger()
@@ -17,116 +17,110 @@ logger = get_logger()
 def get_transform(llm, embedding, language):
     """
     Creates and returns a default set of transforms for processing a knowledge graph.
-    This function defines a series of transformation steps to be applied to a
-    knowledge graph, including extracting summaries, keyphrases, titles,
-    headlines, and embeddings, as well as building similarity relationships
-    between nodes.
-    The transforms are applied in the following order:
-    1. Parallel extraction of summaries and headlines
-    2. Embedding of summaries for document nodes
-    3. Splitting of headlines
-    4. Parallel extraction of embeddings, keyphrases, and titles
-    5. Building cosine similarity relationships between nodes
-    6. Building cosine similarity relationships between summaries
-    Returns
-    -------
-    Transforms
-        A list of transformation steps to be applied to the knowledge graph.
     """
     from ragas.testset.transforms.engine import Parallel
     from ragas.testset.transforms.extractors import (
         EmbeddingExtractor,
         HeadlinesExtractor,
-        KeyphrasesExtractor,
         SummaryExtractor,
-        TitleExtractor,
     )
-    from ragas.testset.transforms.relationship_builders.cosine import (
+    from ragas.testset.transforms.extractors.llm_based import NERExtractor, ThemesExtractor
+    from ragas.testset.transforms.relationship_builders import (
         CosineSimilarityBuilder,
-        SummaryCosineSimilarityBuilder,
+        OverlapScoreBuilder,
     )
     from ragas.testset.transforms.splitters import HeadlineSplitter
+    from ragas.testset.transforms.filters import CustomNodeFilter
     from ragas.testset.graph import NodeType
+    from ragas.utils import num_tokens_from_string
+    def summary_filter(node):
+        return (node.type == NodeType.DOCUMENT and num_tokens_from_string(node.properties['page_content']) > 500)
-    # define the transforms
-    summary_extractor = SummaryExtractor(llm=llm)
-    keyphrase_extractor = KeyphrasesExtractor(llm=llm)
-    title_extractor = TitleExtractor(llm=llm)
+    summary_extractor = SummaryExtractor(llm=llm, filter_nodes=lambda node: summary_filter(node))
+    ner_extractor = NERExtractor(llm=llm, filter_nodes=lambda node: node.type == NodeType.CHUNK)
+    theme_extractor = ThemesExtractor(llm=llm)
     headline_extractor = HeadlinesExtractor(llm=llm)
     asyncio.run(
         translate_prompts(
             prompts=[
                 summary_extractor,
-                keyphrase_extractor,
-                title_extractor,
+                theme_extractor,
+                ner_extractor,
                 headline_extractor,
             ],
             target_lang=language,
             llm=llm,
             adapt_instruction=True,
-        )
-    )
+        ))
+    splitter = HeadlineSplitter(min_tokens=500)
-    embedding_extractor = EmbeddingExtractor(embedding_model=embedding)
-    headline_splitter = HeadlineSplitter()
-    cosine_sim_builder = CosineSimilarityBuilder(threshold=0.8)
-    summary_embedder = EmbeddingExtractor(
-        name='summary_embedder',
-        filter_nodes=lambda node: True if node.type == NodeType.DOCUMENT else False,
+    summary_emb_extractor = EmbeddingExtractor(
+        embedding_model=embedding,
         property_name='summary_embedding',
         embed_property_name='summary',
-        embedding_model=embedding,
+        filter_nodes=lambda node: summary_filter(node),
     )
-    summary_cosine_sim_builder = SummaryCosineSimilarityBuilder(threshold=0.6)
-    # specify the transforms and their order to be applied
+    cosine_sim_builder = CosineSimilarityBuilder(
+        property_name='summary_embedding',
+        new_property_name='summary_similarity',
+        threshold=0.7,
+        filter_nodes=lambda node: summary_filter(node),
+    )
+    ner_overlap_sim = OverlapScoreBuilder(threshold=0.01, filter_nodes=lambda node: node.type == NodeType.CHUNK)
+    node_filter = CustomNodeFilter(llm=llm, filter_nodes=lambda node: node.type == NodeType.CHUNK)
     transforms = [
-        Parallel(summary_extractor, headline_extractor),
-        summary_embedder,
-        headline_splitter,
-        Parallel(embedding_extractor, keyphrase_extractor, title_extractor),
-        cosine_sim_builder,
-        summary_cosine_sim_builder,
+        headline_extractor,
+        splitter,
+        summary_extractor,
+        node_filter,
+        Parallel(summary_emb_extractor, theme_extractor, ner_extractor),
+        Parallel(cosine_sim_builder, ner_overlap_sim),
     ]
     return transforms
 def get_distribution(llm, distribution, language):
-    from ragas.testset.synthesizers.abstract_query import (
-        AbstractQuerySynthesizer,
-        ComparativeAbstractQuerySynthesizer,
+    from ragas.testset.synthesizers.multi_hop import (
+        MultiHopAbstractQuerySynthesizer,
+        MultiHopSpecificQuerySynthesizer,
     )
-    from ragas.testset.synthesizers.specific_query import SpecificQuerySynthesizer
+    from ragas.testset.synthesizers.single_hop.specific import (
+        SingleHopSpecificQuerySynthesizer, )
-    abstract = AbstractQuerySynthesizer(llm=llm)
-    comparative = ComparativeAbstractQuerySynthesizer(llm=llm)
-    specific = SpecificQuerySynthesizer(llm=llm)
+    single_hop = SingleHopSpecificQuerySynthesizer(llm=llm)
+    multi_hop_abs = MultiHopAbstractQuerySynthesizer(llm=llm)
+    multi_hop_spec = MultiHopSpecificQuerySynthesizer(llm=llm)
     asyncio.run(
         translate_prompts(
             prompts=[
-                abstract,
-                comparative,
-                specific,
+                single_hop,
+                multi_hop_abs,
+                multi_hop_spec,
             ],
             target_lang=language,
             llm=llm,
             adapt_instruction=True,
-        )
-    )
-    return [
-        (abstract, distribution['simple']),
-        (comparative, distribution['multi_context']),
-        (specific, distribution['reasoning']),
-    ]
+        ))
+    mapping = {
+        'simple': single_hop,
+        'multi_context': multi_hop_abs,
+        'reasoning': multi_hop_spec,
+    }
+    return [(mapping[key], distribution[key]) for key in mapping if key in distribution]
-def get_knowledge_graph(documents, transforms, local_file):
+def get_knowledge_graph(documents, transforms, local_file, run_config):
     from ragas.testset.graph import KnowledgeGraph, Node, NodeType
     from ragas.testset.transforms import apply_transforms
@@ -148,7 +142,7 @@ def get_knowledge_graph(documents, transforms, local_file):
     kg = KnowledgeGraph(nodes=nodes)
     # apply transforms and update the knowledge graph
-    apply_transforms(kg, transforms)
+    apply_transforms(kg, transforms, run_config=run_config)
     # save the knowledge graph
     output_path = os.path.dirname(local_file)
@@ -158,6 +152,39 @@ def get_knowledge_graph(documents, transforms, local_file):
     return kg
+def get_persona(llm, kg, language):
+    from evalscope.backend.rag_eval.ragas.prompts.persona_prompt import PersonaGenerationPromptZH
+    from ragas.testset.persona import generate_personas_from_kg, PersonaGenerationPrompt
+    from ragas.testset.graph import Node
+    def filter(node: Node) -> bool:
+        if (node.type.name == 'DOCUMENT' and node.properties.get('summary_embedding') is not None):
+            return True
+        else:
+            return False
+    if language == 'chinese':
+        persona_prompt = PersonaGenerationPromptZH()
+    else:
+        persona_prompt = PersonaGenerationPrompt()
+    # NOTE: can't translate this yet
+    # asyncio.run(
+    #     translate_prompts(
+    #         prompts=[persona_prompt],
+    #         target_lang=language,
+    #         llm=llm,
+    #         adapt_instruction=True,
+    #     ))
+    return generate_personas_from_kg(
+        llm=llm,
+        kg=kg,
+        num_personas=3,
+        persona_generation_prompt=persona_prompt,
+        filter_fn=filter,
+    )
 def load_data(file_path):
     from langchain_community.document_loaders import UnstructuredFileLoader
@@ -178,32 +205,31 @@ def generate_testset(args: TestsetGenerationArguments) -> None:
     generator_llm = LLM.load(**args.generator_llm)
     embeddings = EmbeddingModel.load(**args.embeddings)
+    wrapped_llm = LangchainLLMWrapper(generator_llm)
+    wrapped_embeddings = LangchainEmbeddingsWrapper(embeddings)
     # Change resulting question type distribution
-    distributions = get_distribution(
-        LangchainLLMWrapper(generator_llm), args.distribution, args.language
-    )
+    distributions = get_distribution(wrapped_llm, args.distribution, args.language)
+    run_config = RunConfig(timeout=600, max_retries=3, max_wait=120, max_workers=1, log_tenacity=True)
     # get transforms
     transforms = get_transform(
-        LangchainLLMWrapper(generator_llm),
-        LangchainEmbeddingsWrapper(embeddings),
+        wrapped_llm,
+        wrapped_embeddings,
         args.language,
     )
     # get knowledge graph
-    knowledge_graph = get_knowledge_graph(documents, transforms, args.knowledge_graph)
+    knowledge_graph = get_knowledge_graph(documents, transforms, args.knowledge_graph, run_config)
-    generator = TestsetGenerator.from_langchain(
-        generator_llm, embeddings, knowledge_graph
-    )
+    persona_list = get_persona(llm=wrapped_llm, kg=knowledge_graph, language=args.language)
+    generator = TestsetGenerator(llm=wrapped_llm, knowledge_graph=knowledge_graph, persona_list=persona_list)
-    runconfig = RunConfig(
-        timeout=600, max_retries=3, max_wait=120, max_workers=1, log_tenacity=True
-    )
     testset = generator.generate(
         testset_size=args.test_size,
         query_distribution=distributions,
-        run_config=runconfig,
+        run_config=run_config,
         with_debugging_logs=True,
         raise_exceptions=True,
     )
@@ -212,9 +238,7 @@ def generate_testset(args: TestsetGenerationArguments) -> None:
     testset_df = testset.to_pandas()
     output_path = os.path.dirname(args.output_file)
     os.makedirs(output_path, exist_ok=True)
-    testset_df.to_json(
-        args.output_file, indent=4, index=False, orient='records', force_ascii=False
-    )
+    testset_df.to_json(args.output_file, indent=4, index=False, orient='records', force_ascii=False)
     # get answer
     testset_with_answer = get_answer(testset_df, generator_llm, args.language)
@@ -243,21 +267,17 @@ Answer:
         contexts = '\n'.join(row['reference_contexts'])
         # Combine question and contexts as input for the LLM
-        input_text = template.format(
-            language=language, question=question, contexts=contexts
-        )
+        input_text = template.format(language=language, question=question, contexts=contexts)
         # Generate the answer using the generator LLM
         answer = generator_llm.invoke(input_text)
         if isinstance(generator_llm, ChatOpenAI):
             answer = answer.content
-        items.append(
-            {
-                'user_input': question,
-                'retrieved_contexts': row['reference_contexts'],
-                'response': answer,
-                'reference': row['reference'],
-            }
-        )
+        items.append({
+            'user_input': question,
+            'retrieved_contexts': row['reference_contexts'],
+            'response': answer,
+            'reference': row['reference'],
+        })
     return pd.DataFrame.from_dict(items)

evalscope-0.6.1/evalscope/backend/rag_eval/utils/clip.py ADDED Viewed

@@ -0,0 +1,149 @@
+import os
+import torch
+import torch.nn.functional as F
+from typing import List
+from PIL import Image
+from evalscope.backend.rag_eval.utils.tools import download_model, PIL_to_base64
+from transformers import AutoModel, AutoProcessor
+from langchain_core.embeddings import Embeddings
+class VisionModel:
+    @staticmethod
+    def load(**kw):
+        api_base = kw.get("api_base", None)
+        if api_base:
+            return VLMAPI(
+                model_name=kw.get("model_name", ""),
+                openai_api_base=api_base,
+                openai_api_key=kw.get("api_key", "EMPTY"),
+                prompt=kw.get("prompt", None),
+            )
+        else:
+            return CLIPModel(**kw)
+class VLMAPI:
+    def __init__(self, model_name, openai_api_base, openai_api_key, prompt=None):
+        from langchain_openai import ChatOpenAI
+        from langchain_core.prompts import ChatPromptTemplate
+        self.model_name = model_name
+        self.model = ChatOpenAI(
+            model_name=model_name,
+            openai_api_base=openai_api_base,
+            openai_api_key=openai_api_key,
+        )
+        self.default_prompt = "Please describe this image in general. Directly provide the description, do not include prefix like 'This image depicts'"
+        self.prompt = ChatPromptTemplate.from_messages(
+            [
+                ("system", prompt if prompt else self.default_prompt),
+                (
+                    "user",
+                    [
+                        {
+                            "type": "image_url",
+                            "image_url": {"url": "data:image/jpeg;base64,{image_data}"},
+                        }
+                    ],
+                ),
+            ]
+        )
+        self.chain = self.prompt | self.model
+        self.transform = PIL_to_base64
+    def encode_image(self, images):
+        captions = []
+        for image in images:
+            response = self.chain.invoke({"image_data": image})
+            captions.append(response.content)
+        return captions
+class CLIPModel(Embeddings):
+    def __init__(
+        self,
+        model_name: str,
+        revision: str = "master",
+        hub="modelscope",
+        device="cpu",
+    ):
+        self.device = device
+        self.model_name = model_name
+        self.revision = revision
+        # Download the model if it doesn't exist locally
+        if not os.path.exists(model_name) and hub == "modelscope":
+            model_name = download_model(self.model_name, self.revision)
+        # Load the model and processor
+        self.model = AutoModel.from_pretrained(model_name).to(self.device)
+        self.processor = AutoProcessor.from_pretrained(model_name)
+        self.transform = self.processor.image_processor
+        self.tokenizer = self.processor.tokenizer
+    def encode_text(self, batch_texts: List[str] | List[List[str]]):
+        if isinstance(batch_texts[0], list):
+            batch_texts = [
+                text for _, texts in enumerate(batch_texts) for text in texts
+            ]
+        # Ensure that the input texts are within the token limit
+        max_length = self.tokenizer.model_max_length
+        if not max_length or max_length > 0xFFFFFF:
+            max_length = 512
+        encoded_inputs = self.tokenizer(
+            text=batch_texts,
+            max_length=max_length,
+            padding=True,
+            truncation=True,
+            return_tensors="pt",
+        )
+        inputs = {k: v.to(self.device) for k, v in encoded_inputs.items()}
+        with torch.no_grad():
+            text_features = self.model.get_text_features(**inputs)
+        text_features = F.normalize(text_features, p=2, dim=-1)
+        return text_features
+    def encode_image(self, image):
+        batch_images = torch.stack([d["pixel_values"][0] for d in image])
+        batch_images = batch_images.to(self.device)
+        with torch.no_grad():
+            image_features = self.model.get_image_features(batch_images)
+        image_features = F.normalize(image_features, p=2, dim=-1)
+        return image_features
+    def embed_documents(self, texts):
+        text_features = self.encode_text(texts)
+        return text_features.cpu().numpy().tolist()
+    def embed_query(self, text):
+        text_features = self.encode_text([text])
+        return text_features.cpu().numpy().tolist()[0]
+    def embed_image(self, uris: List[str]):
+        # read image and transform
+        images = [Image.open(image_path) for image_path in uris]
+        transformed_images = [
+            self.transform(
+                image,
+                return_tensors="pt",
+            )
+            for image in images
+        ]
+        image_features = self.encode_image(transformed_images)
+        return image_features.cpu().numpy().tolist()
+if __name__ == "__main__":
+    model = CLIPModel("AI-ModelScope/chinese-clip-vit-large-patch14-336px")
+    model.embed_image(
+        [
+            "custom_eval/multimodal/images/AMNH.jpg",
+            "custom_eval/multimodal/images/AMNH.jpg",
+        ]
+    )
+    model.encode_text(["我喜欢吃饭" * 1000])
+    print("done")

evalscope 0.6.0rc0__tar.gz → 0.6.1__tar.gz

Potentially problematic release.

evalscope 0.6.0rc0tar.gz → 0.6.1tar.gz