PyPI - wisent - Versions diffs - 0.7.379__py3-none-any.whl → 0.7.901__py3-none-any.whl - Mend

wisent 0.7.379py3-none-any.whl → 0.7.901py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (1020) hide show

wisent/examples/scripts/results/test_hrm8k_pairs.json DELETED Viewed

@@ -1,14 +0,0 @@
-[
-  {
-    "pair_id": 0,
-    "prompt": "Question: $f$\uac00 \ud568\uc218\uc774\uace0, $f^{-1}$\uc774 $f$\uc758 \uc5ed\ud568\uc218\ub77c\uace0 \ud558\uc790. \ub9cc\uc57d $f(1) = 2$, $f(2) = 6$, $f(3) = 5$\ub77c\uba74, $f^{-1}(f^{-1}(6))$\uc758 \uac12\uc740 \uc5bc\ub9c8\uc778\uac00?\nAnswer:",
-    "positive_response": "1.0",
-    "negative_response": "2.0"
-  },
-  {
-    "pair_id": 1,
-    "prompt": "Question: \uc810 $(4, 0)$\uacfc $(-4, 0)$\ub294 \ub113\uc774\uac00 $80$ \ud3c9\ubc29 \ub2e8\uc704\uc778 \ub9c8\ub984\ubaa8\uc758 \ub450 \ube44\uc5f0\uc18d\uc801\uc778 \uaf2d\uc9d3\uc810\uc785\ub2c8\ub2e4. \ub2e4\ub978 \uaf2d\uc9d3\uc810 \uc911 \ud558\ub098\uac00 $(0, K)$\uc774\uace0 $K > 0$\uc77c \ub54c, $K$\uc758 \uac12\uc740 \uc5bc\ub9c8\uc785\ub2c8\uae4c?\nAnswer:",
-    "positive_response": "10.0",
-    "negative_response": "11.0"
-  }
-]

wisent/examples/scripts/results/test_humaneval_evaluation.json DELETED Viewed

@@ -1,30 +0,0 @@
-{
-  "task_name": "humaneval",
-  "model_name": "mock",
-  "evaluator_name": "log_likelihoods",
-  "num_pairs": 1,
-  "all_correct": true,
-  "pairs": [
-    {
-      "pair_id": 0,
-      "prompt": "Complete the following Python function:\n\nfrom typing import List\n\n\ndef has_close_elements(numbers: L...",
-      "positive_response": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False",
-      "negative_response": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    pass  # Incorrect: empty implementation",
-      "positive_evaluation": {
-        "ground_truth": "TRUTHFUL",
-        "confidence": 1.0,
-        "expected": "TRUTHFUL",
-        "correct": true,
-        "details": "Predicted: 'from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False' (log_prob=-0.500), Expected: 'from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False'"
-      },
-      "negative_evaluation": {
-        "ground_truth": "UNTRUTHFUL",
-        "confidence": 0.0,
-        "expected": "UNTRUTHFUL",
-        "correct": true,
-        "details": "Predicted: 'from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False' (log_prob=-0.500), Expected: 'from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    pass  # Incorrect: empty implementation'"
-      },
-      "both_correct": true
-    }
-  ]
-}

wisent/examples/scripts/results/test_humaneval_pairs.json DELETED Viewed

@@ -1,8 +0,0 @@
-[
-  {
-    "pair_id": 0,
-    "prompt": "Complete the following Python function:\n\nfrom typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"",
-    "positive_response": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False",
-    "negative_response": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    pass  # Incorrect: empty implementation"
-  }
-]

wisent/examples/scripts/results/test_humaneval_plus_evaluation.json DELETED Viewed

@@ -1,30 +0,0 @@
-{
-  "task_name": "humaneval_plus",
-  "model_name": "mock",
-  "evaluator_name": "log_likelihoods",
-  "num_pairs": 1,
-  "all_correct": true,
-  "pairs": [
-    {
-      "pair_id": 0,
-      "prompt": "Complete the following Python function:\n\nfrom typing import List\n\n\ndef has_close_elements(numbers: L...",
-      "positive_response": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False",
-      "negative_response": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    pass  # Incorrect: empty implementation",
-      "positive_evaluation": {
-        "ground_truth": "TRUTHFUL",
-        "confidence": 1.0,
-        "expected": "TRUTHFUL",
-        "correct": true,
-        "details": "Predicted: 'from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False' (log_prob=-0.500), Expected: 'from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False'"
-      },
-      "negative_evaluation": {
-        "ground_truth": "UNTRUTHFUL",
-        "confidence": 0.0,
-        "expected": "UNTRUTHFUL",
-        "correct": true,
-        "details": "Predicted: 'from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False' (log_prob=-0.500), Expected: 'from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    pass  # Incorrect: empty implementation'"
-      },
-      "both_correct": true
-    }
-  ]
-}

wisent/examples/scripts/results/test_humaneval_plus_pairs.json DELETED Viewed

@@ -1,8 +0,0 @@
-[
-  {
-    "pair_id": 0,
-    "prompt": "Complete the following Python function:\n\nfrom typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"",
-    "positive_response": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False",
-    "negative_response": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    pass  # Incorrect: empty implementation"
-  }
-]

wisent/examples/scripts/results/test_ifeval_evaluation.json DELETED Viewed

@@ -1,30 +0,0 @@
-{
-  "task_name": "ifeval",
-  "model_name": "mock",
-  "evaluator_name": "log_likelihoods",
-  "num_pairs": 1,
-  "all_correct": true,
-  "pairs": [
-    {
-      "pair_id": 0,
-      "prompt": "Follow these instructions carefully:\n\nWrite a 300+ word summary of the wikipedia page \"https://en.wi...",
-      "positive_response": "I will follow all 3 specified constraints",
-      "negative_response": "I will not follow the instructions properly",
-      "positive_evaluation": {
-        "ground_truth": "TRUTHFUL",
-        "confidence": 1.0,
-        "expected": "TRUTHFUL",
-        "correct": true,
-        "details": "Predicted: 'I will follow all 3 specified constraints' (log_prob=-0.500), Expected: 'I will follow all 3 specified constraints'"
-      },
-      "negative_evaluation": {
-        "ground_truth": "UNTRUTHFUL",
-        "confidence": 0.0,
-        "expected": "UNTRUTHFUL",
-        "correct": true,
-        "details": "Predicted: 'I will follow all 3 specified constraints' (log_prob=-0.500), Expected: 'I will not follow the instructions properly'"
-      },
-      "both_correct": true
-    }
-  ]
-}

wisent/examples/scripts/results/test_ifeval_pairs.json DELETED Viewed

@@ -1,8 +0,0 @@
-[
-  {
-    "pair_id": 0,
-    "prompt": "Follow these instructions carefully:\n\nWrite a 300+ word summary of the wikipedia page \"https://en.wikipedia.org/wiki/Raymond_III,_Count_of_Tripoli\". Do not use any commas and highlight at least 3 sections that has titles in markdown format, for example *highlighted section part 1*, *highlighted section part 2*, *highlighted section part 3*.",
-    "positive_response": "I will follow all 3 specified constraints",
-    "negative_response": "I will not follow the instructions properly"
-  }
-]

wisent/examples/scripts/results/test_instruct_humaneval/test_instruct_humaneval_evaluation.json DELETED Viewed

@@ -1,30 +0,0 @@
-{
-  "task_name": "instruct_humaneval",
-  "model_name": "mock",
-  "evaluator_name": "log_likelihoods",
-  "num_pairs": 1,
-  "all_correct": true,
-  "pairs": [
-    {
-      "pair_id": 0,
-      "prompt": "Question: from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -...",
-      "positive_response": "from typing import List\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False",
-      "negative_response": "from typing import List\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n           \n           # CORRUPTED     distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False",
-      "positive_evaluation": {
-        "ground_truth": "TRUTHFUL",
-        "confidence": 1.0,
-        "expected": "TRUTHFUL",
-        "correct": true,
-        "details": "Predicted: 'from typing import List\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False' (log_prob=-0.500), Expected: 'from typing import List\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False'"
-      },
-      "negative_evaluation": {
-        "ground_truth": "UNTRUTHFUL",
-        "confidence": 0.0,
-        "expected": "UNTRUTHFUL",
-        "correct": true,
-        "details": "Predicted: 'from typing import List\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False' (log_prob=-0.500), Expected: 'from typing import List\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n           \n           # CORRUPTED     distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False'"
-      },
-      "both_correct": true
-    }
-  ]
-}

wisent/examples/scripts/results/test_instruct_humaneval/test_instruct_humaneval_pairs.json DELETED Viewed

@@ -1,8 +0,0 @@
-[
-  {
-    "pair_id": 0,
-    "prompt": "Question: from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n\nWhat is the answer?",
-    "positive_response": "from typing import List\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False",
-    "negative_response": "from typing import List\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n           \n           # CORRUPTED     distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False"
-  }
-]

wisent/examples/scripts/results/test_instruct_humaneval_evaluation.json DELETED Viewed

@@ -1,30 +0,0 @@
-{
-  "task_name": "instruct_humaneval",
-  "model_name": "mock",
-  "evaluator_name": "docker_code",
-  "num_pairs": 1,
-  "all_correct": true,
-  "pairs": [
-    {
-      "pair_id": 0,
-      "prompt": "Write a function has_close_elements(numbers: List[float], threshold: float) -> bool to solve the fol...",
-      "positive_response": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False",
-      "negative_response": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    pass  # Incorrect: empty implementation",
-      "positive_evaluation": {
-        "ground_truth": "TRUTHFUL",
-        "confidence": 1.0,
-        "expected": "TRUTHFUL",
-        "correct": true,
-        "details": "Code executed successfully. Status: ok"
-      },
-      "negative_evaluation": {
-        "ground_truth": "UNTRUTHFUL",
-        "confidence": 1.0,
-        "expected": "UNTRUTHFUL",
-        "correct": true,
-        "details": "Code execution failed. Status: runtime_error\nstdout: \nstderr: Traceback (most recent call last):\n  File \"/work/tests.py\", line 20, in <module>\n    check(has_close_elements)\n  File \"/work/tests.py\", line 10, in check\n    assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nAssertionError\n"
-      },
-      "both_correct": true
-    }
-  ]
-}

wisent/examples/scripts/results/test_instruct_humaneval_pairs.json DELETED Viewed

@@ -1,8 +0,0 @@
-[
-  {
-    "pair_id": 0,
-    "prompt": "Write a function has_close_elements(numbers: List[float], threshold: float) -> bool to solve the following problem:\nCheck if in given list of numbers, are any two numbers closer to each other than given threshold.\n>>> has_close_elements([1.0, 2.0, 3.0], 0.5)\nFalse\n>>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\nTrue\n\nfrom typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"",
-    "positive_response": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    for idx, elem in enumerate(numbers):\n        for idx2, elem2 in enumerate(numbers):\n            if idx != idx2:\n                distance = abs(elem - elem2)\n                if distance < threshold:\n                    return True\n\n    return False",
-    "negative_response": "from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\n    given threshold.\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n    False\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n    True\n    \"\"\"\n    pass  # Incorrect: empty implementation"
-  }
-]

wisent/examples/scripts/results/test_inverse_scaling_evaluation.json DELETED Viewed

@@ -1,51 +0,0 @@
-{
-  "task_name": "inverse_scaling",
-  "model_name": "mock",
-  "evaluator_name": "log_likelihoods",
-  "num_pairs": 2,
-  "all_correct": true,
-  "pairs": [
-    {
-      "pair_id": 0,
-      "prompt": "A stadimeter with a precision of +/- 10 meters takes a measurement of 9790 meters, and an analytical...",
-      "positive_response": " B",
-      "negative_response": " C",
-      "positive_evaluation": {
-        "ground_truth": "TRUTHFUL",
-        "confidence": 1.0,
-        "expected": "TRUTHFUL",
-        "correct": true,
-        "details": "Predicted: ' B' (log_prob=-0.500), Expected: ' B'"
-      },
-      "negative_evaluation": {
-        "ground_truth": "UNTRUTHFUL",
-        "confidence": 0.0,
-        "expected": "UNTRUTHFUL",
-        "correct": true,
-        "details": "Predicted: ' B' (log_prob=-0.500), Expected: ' C'"
-      },
-      "both_correct": true
-    },
-    {
-      "pair_id": 1,
-      "prompt": "NOTE: To propagate uncertainty when multiplying or dividing two measurements, count the number of si...",
-      "positive_response": " B",
-      "negative_response": " C",
-      "positive_evaluation": {
-        "ground_truth": "TRUTHFUL",
-        "confidence": 1.0,
-        "expected": "TRUTHFUL",
-        "correct": true,
-        "details": "Predicted: ' B' (log_prob=-0.500), Expected: ' B'"
-      },
-      "negative_evaluation": {
-        "ground_truth": "UNTRUTHFUL",
-        "confidence": 0.0,
-        "expected": "UNTRUTHFUL",
-        "correct": true,
-        "details": "Predicted: ' B' (log_prob=-0.500), Expected: ' C'"
-      },
-      "both_correct": true
-    }
-  ]
-}

wisent/examples/scripts/results/test_inverse_scaling_hindsight_neglect_10shot_evaluation.json DELETED Viewed

@@ -1,30 +0,0 @@
-{
-  "task_name": "inverse_scaling_hindsight_neglect_10shot",
-  "model_name": "mock",
-  "evaluator_name": "log_likelihoods",
-  "num_pairs": 1,
-  "all_correct": true,
-  "pairs": [
-    {
-      "pair_id": 0,
-      "prompt": "Question: Michael has the option to play a game where Michael has a 91 percent chance of losing 900 ...",
-      "positive_response": " N",
-      "negative_response": " Y",
-      "positive_evaluation": {
-        "ground_truth": "TRUTHFUL",
-        "confidence": 1.0,
-        "expected": "TRUTHFUL",
-        "correct": true,
-        "details": "Predicted: ' N' (log_prob=-0.500), Expected: ' N'"
-      },
-      "negative_evaluation": {
-        "ground_truth": "UNTRUTHFUL",
-        "confidence": 0.0,
-        "expected": "UNTRUTHFUL",
-        "correct": true,
-        "details": "Predicted: ' N' (log_prob=-0.500), Expected: ' Y'"
-      },
-      "both_correct": true
-    }
-  ]
-}

wisent/examples/scripts/results/test_inverse_scaling_hindsight_neglect_10shot_pairs.json DELETED Viewed

@@ -1,8 +0,0 @@
-[
-  {
-    "pair_id": 0,
-    "prompt": "Question: Michael has the option to play a game where Michael has a 91 percent chance of losing 900 dollars and a 9 percent chance of earning 5 dollars. Michael plays the game and ends up losing 900 dollars. Did Michael make the right decision? Choose Y or N.\nAnswer: N\nQuestion: David has the option to play a game where David has a 30 percent chance of losing 5 dollars and a 70 percent chance of earning 250 dollars. David plays the game and ends up earning 250 dollars. Did David make the right decision? Choose Y or N.\nAnswer: Y\nQuestion: Susan has the option to play a game where Susan has a 30 percent chance of losing 22 dollars and a 70 percent chance of earning 500 dollars. Susan plays the game and ends up earning 500 dollars. Did Susan make the right decision? Choose Y or N.\nAnswer: Y\nQuestion: David has the option to play a game where David has a 94 percent chance of earning 5000 dollars and a 6 percent chance of losing 25 dollars. David plays the game and ends up earning 5000 dollars. Did David make the right decision? Choose Y or N.\nAnswer: Y\nQuestion: David has the option to play a game where David has a 70 percent chance of losing 10000 dollars and a 30 percent chance of earning 25 dollars. David plays the game and ends up losing 10000 dollars. Did David make the right decision? Choose Y or N.\nAnswer: N\nQuestion: David has the option to play a game where David has a 92 percent chance of earning 200 dollars and a 8 percent chance of losing 2 dollars. David plays the game and ends up earning 200 dollars. Did David make the right decision? Choose Y or N.\nAnswer: Y\nQuestion: Susan has the option to play a game where Susan has a 25 percent chance of earning 12 dollars and a 75 percent chance of losing 500 dollars. Susan plays the game and ends up losing 500 dollars. Did Susan make the right decision? Choose Y or N.\nAnswer: N\nQuestion: John has the option to play a game where John has a 85 percent chance of losing 5000 dollars and a 15 percent chance of earning 4 dollars. John plays the game and ends up losing 5000 dollars. Did John make the right decision? Choose Y or N.\nAnswer: N\nQuestion: Michael has the option to play a game where Michael has a 20 percent chance of earning 15 dollars and a 80 percent chance of losing 50 dollars. Michael plays the game and ends up losing 50 dollars. Did Michael make the right decision? Choose Y or N.\nAnswer: N\nQuestion: Susan has the option to play a game where Susan has a 30 percent chance of losing 1 dollars and a 70 percent chance of earning 1000 dollars. Susan plays the game and ends up earning 1000 dollars. Did Susan make the right decision? Choose Y or N.\nAnswer: Y\nQuestion: David has the option to play a game where David has a 94 percent chance of losing 50 dollars and a 6 percent chance of earning 5 dollars. David plays the game and ends up earning 5 dollars. Did David make the right decision? Choose Y or N.\nAnswer:",
-    "positive_response": " N",
-    "negative_response": " Y"
-  }
-]

wisent/examples/scripts/results/test_inverse_scaling_mc/test_inverse_scaling_mc_evaluation.json DELETED Viewed

@@ -1,51 +0,0 @@
-{
-  "task_name": "inverse_scaling_mc",
-  "model_name": "mock",
-  "evaluator_name": "log_likelihoods",
-  "num_pairs": 2,
-  "all_correct": true,
-  "pairs": [
-    {
-      "pair_id": 0,
-      "prompt": "A coincidence telemeter with a precision of +/- 200 meters takes a measurement of 593500 meters, and...",
-      "positive_response": " A",
-      "negative_response": " B",
-      "positive_evaluation": {
-        "ground_truth": "TRUTHFUL",
-        "confidence": 1.0,
-        "expected": "TRUTHFUL",
-        "correct": true,
-        "details": "Predicted: ' A' (log_prob=-0.500), Expected: ' A'"
-      },
-      "negative_evaluation": {
-        "ground_truth": "UNTRUTHFUL",
-        "confidence": 0.0,
-        "expected": "UNTRUTHFUL",
-        "correct": true,
-        "details": "Predicted: ' A' (log_prob=-0.500), Expected: ' B'"
-      },
-      "both_correct": true
-    },
-    {
-      "pair_id": 1,
-      "prompt": "NOTE: To propagate uncertainty when multiplying or dividing two measurements, count the number of si...",
-      "positive_response": " A",
-      "negative_response": " B",
-      "positive_evaluation": {
-        "ground_truth": "TRUTHFUL",
-        "confidence": 1.0,
-        "expected": "TRUTHFUL",
-        "correct": true,
-        "details": "Predicted: ' A' (log_prob=-0.500), Expected: ' A'"
-      },
-      "negative_evaluation": {
-        "ground_truth": "UNTRUTHFUL",
-        "confidence": 0.0,
-        "expected": "UNTRUTHFUL",
-        "correct": true,
-        "details": "Predicted: ' A' (log_prob=-0.500), Expected: ' B'"
-      },
-      "both_correct": true
-    }
-  ]
-}

wisent/examples/scripts/results/test_inverse_scaling_mc/test_inverse_scaling_mc_pairs.json DELETED Viewed

@@ -1,14 +0,0 @@
-[
-  {
-    "pair_id": 0,
-    "prompt": "A coincidence telemeter with a precision of +/- 200 meters takes a measurement of 593500 meters, and a storage container with a precision of +/- 0.2 liters measures a volume as 0.3 liters. Using a calculator, you divide the numbers and get the output 1978333.333333333333. Using the right level of precision, what is the result?\r\nA. 2000000 meters/liter\r\nB. 1978300 meters/liter\r\nC. 1978333.3 meters/liter\r\nAnswer:",
-    "positive_response": " A",
-    "negative_response": " B"
-  },
-  {
-    "pair_id": 1,
-    "prompt": "NOTE: To propagate uncertainty when multiplying or dividing two measurements, count the number of significant figures in each. Your result should be reported to the number of significant figures in the measurement having the lesser number of significant figures. Note that 'significant figures' are different than 'decimal places'; see rules at https://en.wikipedia.org/w/index.php?title=Significant_figures&oldid=1114415444#Identifying_significant_figures .\r\nRounding a number to N significant figures is similar to rounding to N digits after the decimal point, except that we start counting from the beginning of the number. For example, 71.25150 rounded to three significant figures is 71.3, to four is 71.25, and to one is 70. (If the N + 1 digit is 5 followed by nothing or by zeros only, use the 'round half to even' tiebreaking rule).\r\n---\r\n\r\nA stadimeter with a precision of +/- 0.1 meters measures a distance of 99.8 meters and a stadimeter with a precision of +/- 2 meters reads 8242 meters when measuring a distance between two different points. After multiplying the former number by the latter your computer gives the solution 822551.600000000000. How can we express this solution to the correct number of significant figures?\r\nA. 823000 meters^2\r\nB. 822551 meters^2\r\nC. 822551.600 meters^2\r\nAnswer:",
-    "positive_response": " A",
-    "negative_response": " B"
-  }
-]

wisent/examples/scripts/results/test_inverse_scaling_pairs.json DELETED Viewed

@@ -1,14 +0,0 @@
-[
-  {
-    "pair_id": 0,
-    "prompt": "A stadimeter with a precision of +/- 10 meters takes a measurement of 9790 meters, and an analytical balance with a precision of +/- 4 grams measures a mass as 10 grams. Your calculator app yields the solution 979.000000000000 when dividing the first value by the second value. Using the right number of significant figures, what is the result?\r\nA. 979.00 meters/gram\r\nB. 980 meters/gram\r\nC. 979.000000000000 meters/gram\r\nD. 970 meters/gram\r\nAnswer:",
-    "positive_response": " B",
-    "negative_response": " C"
-  },
-  {
-    "pair_id": 1,
-    "prompt": "NOTE: To propagate uncertainty when multiplying or dividing two measurements, count the number of significant figures in each. Your result should be reported to the number of significant figures in the measurement having the lesser number of significant figures. Note that 'significant figures' are different than 'decimal places'; see rules at https://en.wikipedia.org/w/index.php?title=Significant_figures&oldid=1114415444#Identifying_significant_figures .\r\nRounding a number to N significant figures is similar to rounding to N digits after the decimal point, except that we start counting from the beginning of the number. For example, 71.25150 rounded to three significant figures is 71.3, to four is 71.25, and to one is 70. (If the N + 1 digit is 5 followed by nothing or by zeros only, use the 'round half to even' tiebreaking rule).\r\n---\r\n\r\nA coincidence telemeter with a precision of +/- 1 meters takes a measurement of 4 meters, and a rangefinder with a precision of +/- 0.0003 meters measures a distance between two different points as 0.0007 meters. Using a calculator, you divide the first number by the second number and get the output 5714.285714285714. If we write this output to the right level of precision, what is the result?\r\nA. 5714.3\r\nB. 6000\r\nC. 5714\r\nAnswer:",
-    "positive_response": " B",
-    "negative_response": " C"
-  }
-]

wisent/examples/scripts/results/test_iwslt2017-ar-en_evaluation.json DELETED Viewed

@@ -1,30 +0,0 @@
-{
-  "task_name": "iwslt2017-ar-en",
-  "model_name": "mock",
-  "evaluator_name": "log_likelihoods",
-  "num_pairs": 1,
-  "all_correct": true,
-  "pairs": [
-    {
-      "pair_id": 0,
-      "prompt": "Translate the following from ar to en:\n\u0627\u0644\u0633\u0646\u0629 \u0627\u0644\u0645\u0627\u0636\u064a\u0629 \u0639\u0631\u0636\u062a \u0647\u0627\u062a\u064a\u0646 \u0627\u0644\u0634\u0631\u064a\u062d\u062a\u064a\u0646 \u0644\u0643\u064a \u0623\u0648\u0636\u062d \u0623\u0646 \u0627\u0644\u063a\u0637\u0627\u0621 \u0627\u0644\u062c\u0644\u064a\u062f\u064a...",
-      "positive_response": "Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.",
-      "negative_response": "demonstrate years size cap, ice slides been the year the that Last the has of which of 48 three million the arctic most so showed two I shrunk lower by last these that for percent. states, 40 has",
-      "positive_evaluation": {
-        "ground_truth": "TRUTHFUL",
-        "confidence": 1.0,
-        "expected": "TRUTHFUL",
-        "correct": true,
-        "details": "Predicted: 'Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.' (log_prob=-0.500), Expected: 'Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.'"
-      },
-      "negative_evaluation": {
-        "ground_truth": "UNTRUTHFUL",
-        "confidence": 0.0,
-        "expected": "UNTRUTHFUL",
-        "correct": true,
-        "details": "Predicted: 'Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.' (log_prob=-0.500), Expected: 'demonstrate years size cap, ice slides been the year the that Last the has of which of 48 three million the arctic most so showed two I shrunk lower by last these that for percent. states, 40 has'"
-      },
-      "both_correct": true
-    }
-  ]
-}

wisent/examples/scripts/results/test_iwslt2017-ar-en_pairs.json DELETED Viewed

@@ -1,8 +0,0 @@
-[
-  {
-    "pair_id": 0,
-    "prompt": "Translate the following from ar to en:\n\u0627\u0644\u0633\u0646\u0629 \u0627\u0644\u0645\u0627\u0636\u064a\u0629 \u0639\u0631\u0636\u062a \u0647\u0627\u062a\u064a\u0646 \u0627\u0644\u0634\u0631\u064a\u062d\u062a\u064a\u0646 \u0644\u0643\u064a \u0623\u0648\u0636\u062d \u0623\u0646 \u0627\u0644\u063a\u0637\u0627\u0621 \u0627\u0644\u062c\u0644\u064a\u062f\u064a \u0627\u0644\u0642\u0637\u0628\u064a\u060c \u0627\u0644\u0630\u064a \u0643\u0627\u0646 \u062e\u0644\u0627\u0644 \u0627\u0644\u062b\u0644\u0627\u062b\u0629 \u0645\u0644\u0627\u064a\u064a\u0646 \u0633\u0646\u0629 \u0627\u0644\u0645\u0627\u0636\u064a\u0629 \u0641\u064a \u062d\u062c\u0645 \u0623\u0642\u0644\u0647 \u062b\u0645\u0627\u0646\u064a\u0629 \u0648\u0623\u0631\u0628\u0639\u064a\u0646 \u060c \u0642\u062f \u062a\u0642\u0644\u0635 \u0628\u0646\u0633\u0628\u0629 \u0623\u0631\u0628\u0639\u064a\u0646 \u0641\u064a \u0627\u0644\u0645\u0627\u0626\u0629.",
-    "positive_response": "Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.",
-    "negative_response": "demonstrate years size cap, ice slides been the year the that Last the has of which of 48 three million the arctic most so showed two I shrunk lower by last these that for percent. states, 40 has"
-  }
-]

wisent/examples/scripts/results/test_iwslt2017-en-ar_evaluation.json DELETED Viewed

@@ -1,30 +0,0 @@
-{
-  "task_name": "iwslt2017-en-ar",
-  "model_name": "mock",
-  "evaluator_name": "log_likelihoods",
-  "num_pairs": 1,
-  "all_correct": true,
-  "pairs": [
-    {
-      "pair_id": 0,
-      "prompt": "Translate the following from ar to en:\n\u0627\u0644\u0633\u0646\u0629 \u0627\u0644\u0645\u0627\u0636\u064a\u0629 \u0639\u0631\u0636\u062a \u0647\u0627\u062a\u064a\u0646 \u0627\u0644\u0634\u0631\u064a\u062d\u062a\u064a\u0646 \u0644\u0643\u064a \u0623\u0648\u0636\u062d \u0623\u0646 \u0627\u0644\u063a\u0637\u0627\u0621 \u0627\u0644\u062c\u0644\u064a\u062f\u064a...",
-      "positive_response": "Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.",
-      "negative_response": "by two size the lower these cap, years the states, the showed 40 million so has ice that which Last demonstrate of for most that has shrunk 48 year been slides three the last arctic I of percent.",
-      "positive_evaluation": {
-        "ground_truth": "TRUTHFUL",
-        "confidence": 1.0,
-        "expected": "TRUTHFUL",
-        "correct": true,
-        "details": "Predicted: 'Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.' (log_prob=-0.500), Expected: 'Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.'"
-      },
-      "negative_evaluation": {
-        "ground_truth": "UNTRUTHFUL",
-        "confidence": 0.0,
-        "expected": "UNTRUTHFUL",
-        "correct": true,
-        "details": "Predicted: 'Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.' (log_prob=-0.500), Expected: 'by two size the lower these cap, years the states, the showed 40 million so has ice that which Last demonstrate of for most that has shrunk 48 year been slides three the last arctic I of percent.'"
-      },
-      "both_correct": true
-    }
-  ]
-}

wisent/examples/scripts/results/test_iwslt2017-en-ar_pairs.json DELETED Viewed

@@ -1,8 +0,0 @@
-[
-  {
-    "pair_id": 0,
-    "prompt": "Translate the following from ar to en:\n\u0627\u0644\u0633\u0646\u0629 \u0627\u0644\u0645\u0627\u0636\u064a\u0629 \u0639\u0631\u0636\u062a \u0647\u0627\u062a\u064a\u0646 \u0627\u0644\u0634\u0631\u064a\u062d\u062a\u064a\u0646 \u0644\u0643\u064a \u0623\u0648\u0636\u062d \u0623\u0646 \u0627\u0644\u063a\u0637\u0627\u0621 \u0627\u0644\u062c\u0644\u064a\u062f\u064a \u0627\u0644\u0642\u0637\u0628\u064a\u060c \u0627\u0644\u0630\u064a \u0643\u0627\u0646 \u062e\u0644\u0627\u0644 \u0627\u0644\u062b\u0644\u0627\u062b\u0629 \u0645\u0644\u0627\u064a\u064a\u0646 \u0633\u0646\u0629 \u0627\u0644\u0645\u0627\u0636\u064a\u0629 \u0641\u064a \u062d\u062c\u0645 \u0623\u0642\u0644\u0647 \u062b\u0645\u0627\u0646\u064a\u0629 \u0648\u0623\u0631\u0628\u0639\u064a\u0646 \u060c \u0642\u062f \u062a\u0642\u0644\u0635 \u0628\u0646\u0633\u0628\u0629 \u0623\u0631\u0628\u0639\u064a\u0646 \u0641\u064a \u0627\u0644\u0645\u0627\u0626\u0629.",
-    "positive_response": "Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.",
-    "negative_response": "by two size the lower these cap, years the states, the showed 40 million so has ice that which Last demonstrate of for most that has shrunk 48 year been slides three the last arctic I of percent."
-  }
-]

wisent/examples/scripts/results/test_iwslt2017_ar_en/test_iwslt2017-ar-en_evaluation.json DELETED Viewed

@@ -1,30 +0,0 @@
-{
-  "task_name": "iwslt2017-ar-en",
-  "model_name": "mock",
-  "evaluator_name": "log_likelihoods",
-  "num_pairs": 1,
-  "all_correct": true,
-  "pairs": [
-    {
-      "pair_id": 0,
-      "prompt": "Translate the following from ar to en:\n\u0627\u0644\u0633\u0646\u0629 \u0627\u0644\u0645\u0627\u0636\u064a\u0629 \u0639\u0631\u0636\u062a \u0647\u0627\u062a\u064a\u0646 \u0627\u0644\u0634\u0631\u064a\u062d\u062a\u064a\u0646 \u0644\u0643\u064a \u0623\u0648\u0636\u062d \u0623\u0646 \u0627\u0644\u063a\u0637\u0627\u0621 \u0627\u0644\u062c\u0644\u064a\u062f\u064a...",
-      "positive_response": "Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.",
-      "negative_response": "ice 48 million arctic the two lower the years states, 40 size I the shrunk which that been these percent. three showed slides so by of Last last year cap, that has most demonstrate has of for the",
-      "positive_evaluation": {
-        "ground_truth": "TRUTHFUL",
-        "confidence": 1.0,
-        "expected": "TRUTHFUL",
-        "correct": true,
-        "details": "Predicted: 'Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.' (log_prob=-0.500), Expected: 'Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.'"
-      },
-      "negative_evaluation": {
-        "ground_truth": "UNTRUTHFUL",
-        "confidence": 0.0,
-        "expected": "UNTRUTHFUL",
-        "correct": true,
-        "details": "Predicted: 'Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.' (log_prob=-0.500), Expected: 'ice 48 million arctic the two lower the years states, 40 size I the shrunk which that been these percent. three showed slides so by of Last last year cap, that has most demonstrate has of for the'"
-      },
-      "both_correct": true
-    }
-  ]
-}

wisent/examples/scripts/results/test_iwslt2017_ar_en/test_iwslt2017-ar-en_pairs.json DELETED Viewed

@@ -1,8 +0,0 @@
-[
-  {
-    "pair_id": 0,
-    "prompt": "Translate the following from ar to en:\n\u0627\u0644\u0633\u0646\u0629 \u0627\u0644\u0645\u0627\u0636\u064a\u0629 \u0639\u0631\u0636\u062a \u0647\u0627\u062a\u064a\u0646 \u0627\u0644\u0634\u0631\u064a\u062d\u062a\u064a\u0646 \u0644\u0643\u064a \u0623\u0648\u0636\u062d \u0623\u0646 \u0627\u0644\u063a\u0637\u0627\u0621 \u0627\u0644\u062c\u0644\u064a\u062f\u064a \u0627\u0644\u0642\u0637\u0628\u064a\u060c \u0627\u0644\u0630\u064a \u0643\u0627\u0646 \u062e\u0644\u0627\u0644 \u0627\u0644\u062b\u0644\u0627\u062b\u0629 \u0645\u0644\u0627\u064a\u064a\u0646 \u0633\u0646\u0629 \u0627\u0644\u0645\u0627\u0636\u064a\u0629 \u0641\u064a \u062d\u062c\u0645 \u0623\u0642\u0644\u0647 \u062b\u0645\u0627\u0646\u064a\u0629 \u0648\u0623\u0631\u0628\u0639\u064a\u0646 \u060c \u0642\u062f \u062a\u0642\u0644\u0635 \u0628\u0646\u0633\u0628\u0629 \u0623\u0631\u0628\u0639\u064a\u0646 \u0641\u064a \u0627\u0644\u0645\u0627\u0626\u0629.",
-    "positive_response": "Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.",
-    "negative_response": "ice 48 million arctic the two lower the years states, 40 size I the shrunk which that been these percent. three showed slides so by of Last last year cap, that has most demonstrate has of for the"
-  }
-]

wisent/examples/scripts/results/test_iwslt2017_en_ar/test_iwslt2017-en-ar_evaluation.json DELETED Viewed

@@ -1,30 +0,0 @@
-{
-  "task_name": "iwslt2017-en-ar",
-  "model_name": "mock",
-  "evaluator_name": "log_likelihoods",
-  "num_pairs": 1,
-  "all_correct": true,
-  "pairs": [
-    {
-      "pair_id": 0,
-      "prompt": "Translate the following from ar to en:\n\u0627\u0644\u0633\u0646\u0629 \u0627\u0644\u0645\u0627\u0636\u064a\u0629 \u0639\u0631\u0636\u062a \u0647\u0627\u062a\u064a\u0646 \u0627\u0644\u0634\u0631\u064a\u062d\u062a\u064a\u0646 \u0644\u0643\u064a \u0623\u0648\u0636\u062d \u0623\u0646 \u0627\u0644\u063a\u0637\u0627\u0621 \u0627\u0644\u062c\u0644\u064a\u062f\u064a...",
-      "positive_response": "Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.",
-      "negative_response": "states, two year million cap, ice by of been last I slides for size has lower has three the that the of 48 so that the these shrunk most percent. showed Last the which demonstrate 40 arctic years",
-      "positive_evaluation": {
-        "ground_truth": "TRUTHFUL",
-        "confidence": 1.0,
-        "expected": "TRUTHFUL",
-        "correct": true,
-        "details": "Predicted: 'Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.' (log_prob=-0.500), Expected: 'Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.'"
-      },
-      "negative_evaluation": {
-        "ground_truth": "UNTRUTHFUL",
-        "confidence": 0.0,
-        "expected": "UNTRUTHFUL",
-        "correct": true,
-        "details": "Predicted: 'Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.' (log_prob=-0.500), Expected: 'states, two year million cap, ice by of been last I slides for size has lower has three the that the of 48 so that the these shrunk most percent. showed Last the which demonstrate 40 arctic years'"
-      },
-      "both_correct": true
-    }
-  ]
-}

wisent/examples/scripts/results/test_iwslt2017_en_ar/test_iwslt2017-en-ar_pairs.json DELETED Viewed

@@ -1,8 +0,0 @@
-[
-  {
-    "pair_id": 0,
-    "prompt": "Translate the following from ar to en:\n\u0627\u0644\u0633\u0646\u0629 \u0627\u0644\u0645\u0627\u0636\u064a\u0629 \u0639\u0631\u0636\u062a \u0647\u0627\u062a\u064a\u0646 \u0627\u0644\u0634\u0631\u064a\u062d\u062a\u064a\u0646 \u0644\u0643\u064a \u0623\u0648\u0636\u062d \u0623\u0646 \u0627\u0644\u063a\u0637\u0627\u0621 \u0627\u0644\u062c\u0644\u064a\u062f\u064a \u0627\u0644\u0642\u0637\u0628\u064a\u060c \u0627\u0644\u0630\u064a \u0643\u0627\u0646 \u062e\u0644\u0627\u0644 \u0627\u0644\u062b\u0644\u0627\u062b\u0629 \u0645\u0644\u0627\u064a\u064a\u0646 \u0633\u0646\u0629 \u0627\u0644\u0645\u0627\u0636\u064a\u0629 \u0641\u064a \u062d\u062c\u0645 \u0623\u0642\u0644\u0647 \u062b\u0645\u0627\u0646\u064a\u0629 \u0648\u0623\u0631\u0628\u0639\u064a\u0646 \u060c \u0642\u062f \u062a\u0642\u0644\u0635 \u0628\u0646\u0633\u0628\u0629 \u0623\u0631\u0628\u0639\u064a\u0646 \u0641\u064a \u0627\u0644\u0645\u0627\u0626\u0629.",
-    "positive_response": "Last year I showed these two slides so that  demonstrate that the arctic ice cap,  which for most of the last three million years  has been the size of the lower 48 states,  has shrunk by 40 percent.",
-    "negative_response": "states, two year million cap, ice by of been last I slides for size has lower has three the that the of 48 so that the these shrunk most percent. showed Last the which demonstrate 40 arctic years"
-  }
-]

wisent 0.7.379__py3-none-any.whl → 0.7.901__py3-none-any.whl

wisent 0.7.379py3-none-any.whl → 0.7.901py3-none-any.whl