PyPI - adaptive-sdk - Versions diffs - 0.1.4__py3-none-any.whl → 0.1.5__py3-none-any.whl - Mend

adaptive-sdk 0.1.4py3-none-any.whl → 0.1.5py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

adaptive_sdk/graphql_client/custom_fields.py CHANGED Viewed

@@ -1121,6 +1121,8 @@ class JudgeConfigOutputFields(GraphQLField):
     @classmethod
     def examples(cls) -> 'JudgeExampleFields':
         return JudgeExampleFields('examples')
+    system_template: 'JudgeConfigOutputGraphQLField' = JudgeConfigOutputGraphQLField('systemTemplate')
+    user_template: 'JudgeConfigOutputGraphQLField' = JudgeConfigOutputGraphQLField('userTemplate')
     @classmethod
     def model(cls) -> 'ModelFields':

adaptive_sdk/graphql_client/input_types.py CHANGED Viewed

@@ -239,6 +239,8 @@ class JudgeConfigInput(BaseModel):
     model: str
     criteria: str
     examples: List['JudgeExampleInput']
+    system_template: str = Field(alias='systemTemplate')
+    user_template: str = Field(alias='userTemplate')
 class JudgeCreate(BaseModel):
     """@private"""

adaptive_sdk/resources/graders.py CHANGED Viewed

@@ -78,6 +78,102 @@ class GraderCreator(SyncAPIResource, UseCaseResource):  # type: ignore[misc]
             model=judge_model,
             criteria=criteria,
             examples=parsed_examples,
+            systemTemplate=r"""You are an expert evaluator that evaluates completions generated by an AI model on a fixed criterion.
+You will be given all elements of an interaction between human an AI model:
+    The full context of the conversation so far leading up to the last user turn/question is under the CONTEXT header. It may contain extra contextual information.
+    The last user turn/question is under the USER QUESTION header. It may contain extra contextual information.
+    The model's completion is under the COMPLETION TO EVALUATE header.
+    The evaluation criterion is under the EVALUATION CRITERION section.
+{{#if examples.length}}
+CRITICAL: The annotations below are GROUND TRUTH provided by expert human annotators. You MUST follow them exactly, even if they seem counter-intuitive to you.
+In order to to analyze and score a completion, you always run the following steps without exception:
+First, you read the CONTEXT, USER QUESTION and COMPLETION TO EVALUATE.
+Then, you MUST check if there is an annotation that matches (or is very similar to) the current case:
+  - If the USER QUESTION and COMPLETION TO EVALUATE match an annotation, you MUST use the annotation's score and reasoning. Do NOT apply your own judgment.
+  - If there is a similar annotation, you MUST follow the same reasoning pattern and scoring approach, even if it contradicts your intuition.
+  - The annotations define what is considered PASS/FAIL for this specific task. Your personal understanding of the criterion is IRRELEVANT.
+Then, ONLY if no similar annotation exists, you analyze the COMPLETION TO EVALUATE and assign a score according to the criterion.
+Rules to follow:
+- You must always evaluate the COMPLETION TO EVALUATE based solely on the USER QUESTION, and never on an intermediary question that might have been asked in the CONTEXT. The CONTEXT is there for context only.
+- Do not include text that is in the CONTEXT to make your judgement; you are evaluating the COMPLETION TO EVALUATE text only.
+- You must not use the original instructions given to the model in the CONTEXT for your judgement. Focus only on the ANNOTATIONS AND EVALUATION CRITERION without any other influencing factors.
+- You are forbidden to return a score other than PASS, FAIL or NA for each criterion.
+- If the criterion is conditional, and is not applicable to the specific USER QUESTION + COMPLETION TO EVALUATE pair, you must score it as NA.
+- Return a single score, no matter how many things are evaluated or contemplated in the criterion. A PASS means the completion complied with everything.
+- ANNOTATIONS ARE ABSOLUTE TRUTH. If an annotation says something is PASS, it is PASS, regardless of what you think.
+{{else}}
+In order to to analyze and score a completion, you always run the following steps without exception:
+First, you read the CONTEXT, USER QUESTION and COMPLETION TO EVALUATE.
+Then, you analyze the COMPLETION TO EVALUATE, and assign it a PASS, FAIL or NA score according to the criterion: FAIL if the completion does not meet the criterion, PASS if it does, and NA if the criterion is not applicable to the example.
+Rules to follow:
+- You must always evaluate the COMPLETION TO EVALUATE based solely on the USER QUESTION, and never on an intermediary question that might have been asked in the CONTEXT. The CONTEXT is there for context only.
+- Do not include text that is in the CONTEXT to make your judgement; you are evaluating the COMPLETION TO EVALUATE text only.
+- You must not use the original instructions given to the model in the CONTEXT for your judgement. Focus only on the EVALUATION CRITERION.
+- You are forbidden to return a score other than PASS, FAIL or NA for each criterion.
+- If the criterion is conditional, and is not applicable to the specific USER QUESTION + COMPLETION TO EVALUATE pair, you must score it as NA.
+- Return a single score, no matter how many things are evaluated or contemplated in the criterion. A PASS means the completion complied with everything.
+{{/if}}
+Finally, output an explanation for your judgement and the score for the criterion, as exemplified below.
+Your output should be a well-formatted JSON string that conforms to the JSON schema below. Do not output anything else other than the JSON string.
+Here is the output JSON schema you must strictly follow, with field descriptions and value types. All fields are required.
+{
+  "reasoning": str,
+  "score": Literal["PASS", "FAIL", "NA"]
+}
+reasoning: Reasoning string to support the rationale behind the score.{{#if examples.length}} If using an annotation, you MUST reference it explicitly (e.g., "Based on ANNOTATION 0...").{{/if}}
+score: The literal score for the sample
+Evaluate only the final COMPLETION TO EVALUATE with regard to the USER QUESTION shown. Do not return any preamble or explanations. Return exactly one valid JSON string.
+{{#each examples}}
+### ANNOTATION {{@index}} ###
+CONTEXT
+{{{context_str}}}
+USER QUESTION
+{{{user_question}}}
+COMPLETION TO EVALUATE
+{{{completion}}}
+EVALUATION CRITERION
+{{{../criteria}}}
+OUTPUT
+{{{output_json}}}
+{{/each}}""",
+            userTemplate=r"""CONTEXT
+{{{context_str_without_last_user}}}
+USER QUESTION
+{{{last_user_turn_content}}}
+COMPLETION TO EVALUATE
+{{{completion}}}
+{{#if examples.length}}
+INSTRUCTIONS:
+1. FIRST: Check if this exact case (or a very similar case) appears in the ANNOTATIONS above. If it does, you MUST use that annotation's score and reasoning. Do NOT second-guess it.
+2. If similar cases exist in the annotations, follow the same logic and scoring pattern they demonstrate, even if it contradicts common sense.
+3. ONLY if no relevant annotation exists, apply the general criterion: {{{criteria}}}
+Remember: Annotations override everything else, including your intuition and the general criterion.
+{{else}}
+EVALUATION CRITERION
+{{{criteria}}}
+{{/if}}
+OUTPUT SCHEMA
+{{{output_schema}}}
+OUTPUT""",
         )
         # Create grader config
@@ -258,6 +354,102 @@ class AsyncGraderCreator(AsyncAPIResource, UseCaseResource):  # type: ignore[mis
             model=judge_model,
             criteria=criteria,
             examples=parsed_examples,
+            systemTemplate=r"""You are an expert evaluator that evaluates completions generated by an AI model on a fixed criterion.
+You will be given all elements of an interaction between human an AI model:
+    The full context of the conversation so far leading up to the last user turn/question is under the CONTEXT header. It may contain extra contextual information.
+    The last user turn/question is under the USER QUESTION header. It may contain extra contextual information.
+    The model's completion is under the COMPLETION TO EVALUATE header.
+    The evaluation criterion is under the EVALUATION CRITERION section.
+{{#if examples.length}}
+CRITICAL: The annotations below are GROUND TRUTH provided by expert human annotators. You MUST follow them exactly, even if they seem counter-intuitive to you.
+In order to to analyze and score a completion, you always run the following steps without exception:
+First, you read the CONTEXT, USER QUESTION and COMPLETION TO EVALUATE.
+Then, you MUST check if there is an annotation that matches (or is very similar to) the current case:
+  - If the USER QUESTION and COMPLETION TO EVALUATE match an annotation, you MUST use the annotation's score and reasoning. Do NOT apply your own judgment.
+  - If there is a similar annotation, you MUST follow the same reasoning pattern and scoring approach, even if it contradicts your intuition.
+  - The annotations define what is considered PASS/FAIL for this specific task. Your personal understanding of the criterion is IRRELEVANT.
+Then, ONLY if no similar annotation exists, you analyze the COMPLETION TO EVALUATE and assign a score according to the criterion.
+Rules to follow:
+- You must always evaluate the COMPLETION TO EVALUATE based solely on the USER QUESTION, and never on an intermediary question that might have been asked in the CONTEXT. The CONTEXT is there for context only.
+- Do not include text that is in the CONTEXT to make your judgement; you are evaluating the COMPLETION TO EVALUATE text only.
+- You must not use the original instructions given to the model in the CONTEXT for your judgement. Focus only on the ANNOTATIONS AND EVALUATION CRITERION without any other influencing factors.
+- You are forbidden to return a score other than PASS, FAIL or NA for each criterion.
+- If the criterion is conditional, and is not applicable to the specific USER QUESTION + COMPLETION TO EVALUATE pair, you must score it as NA.
+- Return a single score, no matter how many things are evaluated or contemplated in the criterion. A PASS means the completion complied with everything.
+- ANNOTATIONS ARE ABSOLUTE TRUTH. If an annotation says something is PASS, it is PASS, regardless of what you think.
+{{else}}
+In order to to analyze and score a completion, you always run the following steps without exception:
+First, you read the CONTEXT, USER QUESTION and COMPLETION TO EVALUATE.
+Then, you analyze the COMPLETION TO EVALUATE, and assign it a PASS, FAIL or NA score according to the criterion: FAIL if the completion does not meet the criterion, PASS if it does, and NA if the criterion is not applicable to the example.
+Rules to follow:
+- You must always evaluate the COMPLETION TO EVALUATE based solely on the USER QUESTION, and never on an intermediary question that might have been asked in the CONTEXT. The CONTEXT is there for context only.
+- Do not include text that is in the CONTEXT to make your judgement; you are evaluating the COMPLETION TO EVALUATE text only.
+- You must not use the original instructions given to the model in the CONTEXT for your judgement. Focus only on the EVALUATION CRITERION.
+- You are forbidden to return a score other than PASS, FAIL or NA for each criterion.
+- If the criterion is conditional, and is not applicable to the specific USER QUESTION + COMPLETION TO EVALUATE pair, you must score it as NA.
+- Return a single score, no matter how many things are evaluated or contemplated in the criterion. A PASS means the completion complied with everything.
+{{/if}}
+Finally, output an explanation for your judgement and the score for the criterion, as exemplified below.
+Your output should be a well-formatted JSON string that conforms to the JSON schema below. Do not output anything else other than the JSON string.
+Here is the output JSON schema you must strictly follow, with field descriptions and value types. All fields are required.
+{
+  "reasoning": str,
+  "score": Literal["PASS", "FAIL", "NA"]
+}
+reasoning: Reasoning string to support the rationale behind the score.{{#if examples.length}} If using an annotation, you MUST reference it explicitly (e.g., "Based on ANNOTATION 0...").{{/if}}
+score: The literal score for the sample
+Evaluate only the final COMPLETION TO EVALUATE with regard to the USER QUESTION shown. Do not return any preamble or explanations. Return exactly one valid JSON string.
+{{#each examples}}
+### ANNOTATION {{@index}} ###
+CONTEXT
+{{{context_str}}}
+USER QUESTION
+{{{user_question}}}
+COMPLETION TO EVALUATE
+{{{completion}}}
+EVALUATION CRITERION
+{{{../criteria}}}
+OUTPUT
+{{{output_json}}}
+{{/each}}""",
+            userTemplate=r"""CONTEXT
+{{{context_str_without_last_user}}}
+USER QUESTION
+{{{last_user_turn_content}}}
+COMPLETION TO EVALUATE
+{{{completion}}}
+{{#if examples.length}}
+INSTRUCTIONS:
+1. FIRST: Check if this exact case (or a very similar case) appears in the ANNOTATIONS above. If it does, you MUST use that annotation's score and reasoning. Do NOT second-guess it.
+2. If similar cases exist in the annotations, follow the same logic and scoring pattern they demonstrate, even if it contradicts common sense.
+3. ONLY if no relevant annotation exists, apply the general criterion: {{{criteria}}}
+Remember: Annotations override everything else, including your intuition and the general criterion.
+{{else}}
+EVALUATION CRITERION
+{{{criteria}}}
+{{/if}}
+OUTPUT SCHEMA
+{{{output_schema}}}
+OUTPUT""",
         )
         # Create grader config

{adaptive_sdk-0.1.4.dist-info → adaptive_sdk-0.1.5.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: adaptive-sdk
-Version: 0.1.4
+Version: 0.1.5
 Summary: Python SDK for Adaptive Engine
 Author-email: Vincent Debergue <vincent@adaptive-ml.com>, Joao Moura <joao@adaptive-ml.com>, Yacine Bouraoui <yacine@adaptive-ml.com>
 Requires-Python: >=3.10

{adaptive_sdk-0.1.4.dist-info → adaptive_sdk-0.1.5.dist-info}/RECORD RENAMED Viewed

@@ -38,7 +38,7 @@ adaptive_sdk/graphql_client/create_role.py,sha256=6aTdNOZxavMyjkH-g01uYOZgpjYWcA
 adaptive_sdk/graphql_client/create_team.py,sha256=6Alt1ralE1-Xvp2wrEaLUHMW5RtiFqz2fIsUYE_2LbM,370
 adaptive_sdk/graphql_client/create_use_case.py,sha256=sekD76jWCo3zNCfMsBGhVYfNSIK4JPPBz9066BOt49g,332
 adaptive_sdk/graphql_client/create_user.py,sha256=gurD0kZgncXt1HBr7Oo5AkK5ubqFKpJvaR1rn506gHo,301
-adaptive_sdk/graphql_client/custom_fields.py,sha256=8sttgH49IIcTg7zS8qD1HVsoGBCIIKxyxmYOd56S6jw,94030
+adaptive_sdk/graphql_client/custom_fields.py,sha256=Jw6y3sMcBr5b18DX7NISst6D-NZDgBAIbIq92x3dKtk,94232
 adaptive_sdk/graphql_client/custom_mutations.py,sha256=-CbU1jLSndKtHg58dJUPnGzJhwTRCchtwjhJtsxUXeI,24216
 adaptive_sdk/graphql_client/custom_queries.py,sha256=rQNbFQ0M7FJylZ-fY-JaMhWdeHSJp7L6N9V3fJEozOQ,16273
 adaptive_sdk/graphql_client/custom_typing_fields.py,sha256=cU1PgxbzDQiM1lBJyB4C1IzNirxJS_NJRbHqi_PLM50,18935
@@ -63,7 +63,7 @@ adaptive_sdk/graphql_client/fragments.py,sha256=zkGLGnbMdoc9vO5PJL-iDnMtIKetNx-8
 adaptive_sdk/graphql_client/get_custom_recipe.py,sha256=7qxBZGQTqpc69k-NwzgFctaHWaRz0tHl7YlVSsEad6U,383
 adaptive_sdk/graphql_client/get_grader.py,sha256=kubHDBtUcq6mZtUR5_Of0QbjnGUPSYuavF3_xwmwbY8,233
 adaptive_sdk/graphql_client/get_judge.py,sha256=urEnHW3XfURi5GAFBPfbqzOZGQDxgsGRA6nZmUKmoMA,224
-adaptive_sdk/graphql_client/input_types.py,sha256=8e4fiqIP0uf9T38iRmoD3HGKFquNvaruwKvN0Ic0BrU,19027
+adaptive_sdk/graphql_client/input_types.py,sha256=Wvz4vZ9UAxnD3zR4RlZESw20K1k73T3I_l1ZJsbtDms,19137
 adaptive_sdk/graphql_client/link_metric.py,sha256=EDH67ckBzzc6MYIGfsmgZRBnjqxLsCGwFUaFMXPEsBY,327
 adaptive_sdk/graphql_client/list_ab_campaigns.py,sha256=SIbU6I2OQkNHt0Gw6YStoiiwJHUk2rfXnpoGLzrFjxc,379
 adaptive_sdk/graphql_client/list_compute_pools.py,sha256=4Qli5Foxm3jztbUAL5gbwqtcrElwwlC4LGJMOMBI6Cc,782
@@ -109,7 +109,7 @@ adaptive_sdk/resources/compute_pools.py,sha256=4eHP8FMkZOsGPjZ-qBvda2PunA6GMyvvJ
 adaptive_sdk/resources/datasets.py,sha256=44Lt6xaZ-YTKy04fce9J7chnfFofKJr_8bfkamDjZNg,4992
 adaptive_sdk/resources/embeddings.py,sha256=-ov_EChHU6PJJOJRtDlCo4sYyr9hwyvRjnBhub8QNFg,3922
 adaptive_sdk/resources/feedback.py,sha256=lujqwFIhxi6iovL8JWL05Kr-gkzR4QEwUXZbTx33raA,14116
-adaptive_sdk/resources/graders.py,sha256=b6q-5Z6x-vAoZuXHl6xFrcwC3S4TPXxu121SpR3fYdk,17230
+adaptive_sdk/resources/graders.py,sha256=ekQQ5fqmLZpZHeLr6iUm6m45wDevoDJdj3mG-axR-m8,29014
 adaptive_sdk/resources/interactions.py,sha256=9A0aKyfE5dhMj-rj6NOiF7kxAl89SXksFsRJXXjPGK8,10810
 adaptive_sdk/resources/jobs.py,sha256=TO79natSIDexj3moat_5hAjTGAy_-p9dn0qYExYeNQM,4305
 adaptive_sdk/resources/models.py,sha256=krQbfMnVkjNqXfPG-8irH_xlloDpFpQiqYsbED3-8z8,18591
@@ -122,6 +122,6 @@ adaptive_sdk/resources/users.py,sha256=SoGWwdDCdhK4KjYOcAws-ZWlW7Edii7D3Vxfdu-NZ
 adaptive_sdk/rest/__init__.py,sha256=Szn4qFr1ChFRxMvaVjeaAsGoFU3oV26xZB-vkRCu2Hk,611
 adaptive_sdk/rest/base_model.py,sha256=gQvP9N3QLDNlWKFfLeT5Cf0WwGFtKxyi8VWidIZn2jA,541
 adaptive_sdk/rest/rest_types.py,sha256=Ln8tEN9JCaOdAxg4Y2CYoAc2oeNGtFOoUx2jx6huBWk,7586
-adaptive_sdk-0.1.4.dist-info/WHEEL,sha256=G2gURzTEtmeR8nrdXUJfNiB3VYVxigPQ-bEQujpNiNs,82
-adaptive_sdk-0.1.4.dist-info/METADATA,sha256=aBapHQQjBXSRTC0LjYyHk_0vl1SppfBmdH4DkrSJf7E,1436
-adaptive_sdk-0.1.4.dist-info/RECORD,,
+adaptive_sdk-0.1.5.dist-info/WHEEL,sha256=G2gURzTEtmeR8nrdXUJfNiB3VYVxigPQ-bEQujpNiNs,82
+adaptive_sdk-0.1.5.dist-info/METADATA,sha256=oGoMvRCrkfiHcH6qqMggQU3C__ryp-m5y0qkPhySHF4,1436
+adaptive_sdk-0.1.5.dist-info/RECORD,,

{adaptive_sdk-0.1.4.dist-info → adaptive_sdk-0.1.5.dist-info}/WHEEL RENAMED Viewed

File without changes

adaptive-sdk 0.1.4__py3-none-any.whl → 0.1.5__py3-none-any.whl

adaptive-sdk 0.1.4py3-none-any.whl → 0.1.5py3-none-any.whl