PyPI - arize-phoenix - Versions diffs - 2.7.0__py3-none-any.whl → 2.8.0__py3-none-any.whl - Mend

arize-phoenix 2.7.0py3-none-any.whl → 2.8.0py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of arize-phoenix might be problematic. Click here for more details.

Files changed (26) hide show

{arize_phoenix-2.7.0.dist-info → arize_phoenix-2.8.0.dist-info}/METADATA +5 -2
{arize_phoenix-2.7.0.dist-info → arize_phoenix-2.8.0.dist-info}/RECORD +26 -26
{arize_phoenix-2.7.0.dist-info → arize_phoenix-2.8.0.dist-info}/WHEEL +1 -1
phoenix/exceptions.py +4 -0
phoenix/experimental/evals/functions/classify.py +1 -1
phoenix/experimental/evals/models/anthropic.py +27 -22
phoenix/experimental/evals/models/base.py +1 -56
phoenix/experimental/evals/models/bedrock.py +23 -13
phoenix/experimental/evals/models/litellm.py +10 -17
phoenix/experimental/evals/models/openai.py +46 -53
phoenix/experimental/evals/models/vertex.py +19 -29
phoenix/experimental/evals/models/vertexai.py +1 -20
phoenix/server/api/schema.py +2 -3
phoenix/server/static/index.js +557 -517
phoenix/session/session.py +2 -1
phoenix/trace/exporter.py +15 -11
phoenix/trace/fixtures.py +10 -0
phoenix/trace/llama_index/callback.py +5 -5
phoenix/trace/llama_index/streaming.py +3 -4
phoenix/trace/otel.py +49 -21
phoenix/trace/schemas.py +2 -2
phoenix/trace/span_json_decoder.py +5 -4
phoenix/trace/tracer.py +6 -5
phoenix/version.py +1 -1
{arize_phoenix-2.7.0.dist-info → arize_phoenix-2.8.0.dist-info}/licenses/IP_NOTICE +0 -0
{arize_phoenix-2.7.0.dist-info → arize_phoenix-2.8.0.dist-info}/licenses/LICENSE +0 -0

{arize_phoenix-2.7.0.dist-info → arize_phoenix-2.8.0.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: arize-phoenix
-Version: 2.7.0
+Version: 2.8.0
 Summary: ML Observability in your notebook
 Project-URL: Documentation, https://docs.arize.com/phoenix/
 Project-URL: Issues, https://github.com/Arize-ai/phoenix/issues
@@ -86,6 +86,9 @@ Description-Content-Type: text/markdown
     <a target="_blank" href="https://pypi.org/project/arize-phoenix/">
         <img src="https://img.shields.io/pypi/pyversions/arize-phoenix">
     </a>
+    <a target="_blank" href="https://hub.docker.com/repository/docker/arizephoenix/phoenix/general">
+        <img src="https://img.shields.io/docker/v/arizephoenix/phoenix?sort=semver&logo=docker&label=image&color=blue">
+    </a>
 </p>
 ![a rotating UMAP point cloud of a computer vision model](https://github.com/Arize-ai/phoenix-assets/blob/main/gifs/image_classification_10mb.gif?raw=true)
@@ -134,7 +137,7 @@ pip install arize-phoenix[experimental]
 ![LLM Application Tracing](https://github.com/Arize-ai/phoenix-assets/blob/main/gifs/langchain_rag_stuff_documents_chain_10mb.gif?raw=true)
-With the advent of powerful LLMs, it is now possible to build LLM Applications that can perform complex tasks like summarization, translation, question and answering, and more. However, these applications are often difficult to debug and troubleshoot as they have an extensive surface area: search and retrieval via vector stores, embedding generation, usage of external tools and so on. Phoenix provides a tracing framework that allows you to trace through the execution of your LLM Application hierarchically. This allows you to understand the internals of your LLM Application and to troubleshoot the complex components of your applicaition. Phoenix is built on top of the OpenInference tracing standard and uses it to trace, export, and collect critical information about your LLM Application in the form of `spans`. For more details on the OpenInference tracing standard, see the [OpenInference Specification](https://github.com/Arize-ai/open-inference-spec)
+With the advent of powerful LLMs, it is now possible to build LLM Applications that can perform complex tasks like summarization, translation, question and answering, and more. However, these applications are often difficult to debug and troubleshoot as they have an extensive surface area: search and retrieval via vector stores, embedding generation, usage of external tools and so on. Phoenix provides a tracing framework that allows you to trace through the execution of your LLM Application hierarchically. This allows you to understand the internals of your LLM Application and to troubleshoot the complex components of your applicaition. Phoenix is built on top of the OpenInference tracing standard and uses it to trace, export, and collect critical information about your LLM Application in the form of `spans`. For more details on the OpenInference tracing standard, see the [OpenInference Specification](https://github.com/Arize-ai/openinference)
 ### Tracing with LlamaIndex

{arize_phoenix-2.7.0.dist-info → arize_phoenix-2.8.0.dist-info}/RECORD RENAMED Viewed

@@ -1,10 +1,10 @@
 phoenix/__init__.py,sha256=EEh0vZGRQS8686h34GQ64OjQoZ7neKYO_iO5j6Oa9Jw,1402
 phoenix/config.py,sha256=RbQw8AkVyI4SSo5CD520AjUNcwkDNOGZA6_ErE48R7A,3454
 phoenix/datetime_utils.py,sha256=D955QLrkgrrSdUM6NyqbCeAu2SMsjhR5rHVQEsVUdng,2773
-phoenix/exceptions.py,sha256=igIWGAg3m8jm5YwQDeCY1p8ml_60A7zaGVXJ1yZhY9s,44
+phoenix/exceptions.py,sha256=X5k9ipUDfwSCwZB-H5zFJLas86Gf9tAx0W4l5TZxp5k,108
 phoenix/py.typed,sha256=AbpHGcgLb-kRsJGnwFEktk7uzpZOCcBY74-YBdrKVGs,1
 phoenix/services.py,sha256=f6AeyKTuOpy9RCcTCjVH3gx5nYZhbTMFOuv1WSUOB5o,4992
-phoenix/version.py,sha256=EtKWW0Hnl5oWglRNH0HZigvcDT2FEs58ek8buJdwW1E,22
+phoenix/version.py,sha256=z6im3C9Qb6qiQIpaJdE4f9WQiCnFGSUQnQXDPw_dvDg,22
 phoenix/core/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 phoenix/core/embedding_dimension.py,sha256=zKGbcvwOXgLf-yrJBpQyKtd-LEOPRKHnUToyAU8Owis,87
 phoenix/core/evals.py,sha256=gJyqQzpud5YjtoY8h4pgXvHDsdubGfqmEewLuZHPPmQ,10224
@@ -23,19 +23,19 @@ phoenix/experimental/evals/__init__.py,sha256=q96YKLMt2GJD9zL8sjugvWx1INfw40Wa7E
 phoenix/experimental/evals/evaluators.py,sha256=r7fXrS-l4gn58SUhLAZSfY3P8lxysouSVJwHddrZJ_Q,15956
 phoenix/experimental/evals/retrievals.py,sha256=o3fqrsYbYZjyGj_jWkN_9VQVyXjLkDKDw5Ws7l8bwdI,3828
 phoenix/experimental/evals/functions/__init__.py,sha256=NNd0-_cmIopdV7vm3rspjfgM726qoQJ4DPq_vqbnaxQ,180
-phoenix/experimental/evals/functions/classify.py,sha256=A-seuYrwiNFdc4IK9WJkQVKY78YdBHxaCMSDPL4_SXE,19523
+phoenix/experimental/evals/functions/classify.py,sha256=6yCajPT9i98b4_2qYn9ZxGhdI3CLhfUSrEyUUcqQqmQ,19517
 phoenix/experimental/evals/functions/executor.py,sha256=bM7PI2rcPukQQzZ2rWqN_-Kfo_a935YJj0bh1Red8Ps,13406
 phoenix/experimental/evals/functions/generate.py,sha256=8LnnPAjBM9yxitdkaGZ67OabuDTOWBF3fvinJ_uCFRg,5584
 phoenix/experimental/evals/functions/processing.py,sha256=F4xtLsulLV4a8CkuLldRddsCim75dSTIShEJUYN6I6w,1823
 phoenix/experimental/evals/models/__init__.py,sha256=j1N7DhiOPbcaemtVBONcQ0miNnGQwEXz4u3P3Vwe6-4,320
-phoenix/experimental/evals/models/anthropic.py,sha256=VRYYbZr8ZFvC-19VxScMNux_Yp_9DzSRXiSmWUuhlOc,6309
-phoenix/experimental/evals/models/base.py,sha256=z8xB18s6JI_Weihq2yG22Rte2RBde_cdHq9rINAXHYw,8086
-phoenix/experimental/evals/models/bedrock.py,sha256=VrLNifBxmgHVMFqp6j9d1aGQIvDDuw8yjBM8CdIZCH4,8009
-phoenix/experimental/evals/models/litellm.py,sha256=YvlYeAV-gG0IxFoVJ_OuRYwVwQ0LEtYBuWmp-uPGrNU,4368
-phoenix/experimental/evals/models/openai.py,sha256=Yht-AZDq2iiwMUlkG3ghv3tCxZY8p-L7xxhSeGPtfaM,17238
+phoenix/experimental/evals/models/anthropic.py,sha256=BZmLvepkSMj_opCWsZoL34a3yAwRdl7qbJB86DFR84E,6688
+phoenix/experimental/evals/models/base.py,sha256=RWz_Jzj3Z1fENl2WUXIz-4eMsk6HfYXc0K8IZ-BJss4,6306
+phoenix/experimental/evals/models/bedrock.py,sha256=nVOXRZr-iDwHEINozpO2bqZR2KEeDHNyj6jgQPONQYs,8565
+phoenix/experimental/evals/models/litellm.py,sha256=0c-eJFsx41W0MsqeUd4UPquLBKSZp3BRNhKhX2uFCAs,4123
+phoenix/experimental/evals/models/openai.py,sha256=NUWywf2PmHi9IbQ0MK6_An1hZYE5Sr8ngKoLD3MGrjU,17298
 phoenix/experimental/evals/models/rate_limiters.py,sha256=5GVN0RQKt36Przg3-9jLgocRmyg-tbeO-cdbuLIx89w,10160
-phoenix/experimental/evals/models/vertex.py,sha256=52A1g8j54_VkahjQmLj0eguPKJdQj0xtI4dAlrLsgtY,6592
-phoenix/experimental/evals/models/vertexai.py,sha256=NfBpQq0l7XzP-wDEDsK27IRiQBzA1GXEdfwlAf8leX4,5609
+phoenix/experimental/evals/models/vertex.py,sha256=1VAGJNoiUm56pP8G9Qvnf-4_Rl9u9NI7ToOKbWFNtpk,6226
+phoenix/experimental/evals/models/vertexai.py,sha256=_txsOP2RHyR3AnugeJRFUNvYm3xXvfMbWpULxTko4OA,4821
 phoenix/experimental/evals/templates/__init__.py,sha256=GSJSoWJ4jwyoUANniidmWMUtXQhNQYbTJbfFqCvuYuo,1470
 phoenix/experimental/evals/templates/default_templates.py,sha256=dVKmoLwqgAyGcRuezz9WKnXSHhw7-qk1R8j6wSmqh0s,20722
 phoenix/experimental/evals/templates/template.py,sha256=ImFSaTPo9oalPNwq7cNdOCndrvuwLuIyIFKsgDVcoJE,6715
@@ -65,7 +65,7 @@ phoenix/server/api/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuF
 phoenix/server/api/context.py,sha256=02vRgyLFpDCmh97QwsjWD5cdNZkoCUtDPPs1YItbdbI,583
 phoenix/server/api/helpers.py,sha256=_V1eVkchZmTkhOfRC4QqR1sUB2xtIxdsMJkDouZq_IE,251
 phoenix/server/api/interceptor.py,sha256=do_J4HjPPQ_C7bMmqe1YpTmt_hoxcwC2I8P3n5sZBo4,1302
-phoenix/server/api/schema.py,sha256=b_GiRJKkfnqR_Fy51N4NWN2nh7clao2V6C8G94nTYo4,15303
+phoenix/server/api/schema.py,sha256=lEahYCASRgRTw6nOme7zQtyKaVbHqK5CQUbg5XTT5nU,15293
 phoenix/server/api/input_types/ClusterInput.py,sha256=EL4ftvZxQ8mVdruUPcdhMhByORmSmM8S-X6RPqU6GX0,179
 phoenix/server/api/input_types/Coordinates.py,sha256=meTwbIjwTfqx5DGD2DBlH9wQzdQVNM5a8x9dp1FfIgA,173
 phoenix/server/api/input_types/DataQualityMetricInput.py,sha256=LazvmQCCM5m9SDZTpyxQXO1rYF4cmsc3lsR2S9S65X4,1292
@@ -125,26 +125,26 @@ phoenix/server/static/apple-touch-icon-76x76.png,sha256=CT_xT12I0u2i0WU8JzBZBuOQ
 phoenix/server/static/apple-touch-icon.png,sha256=fOfpjqGpWYbJ0eAurKsyoZP1EAs6ZVooBJ_SGk2ZkDs,3801
 phoenix/server/static/favicon.ico,sha256=bY0vvCKRftemZfPShwZtE93DiiQdaYaozkPGwNFr6H8,34494
 phoenix/server/static/index.css,sha256=KKGpx4iwF91VGRm0YN-4cn8oC-oIqC6HecoPf0x3ZM8,1885
-phoenix/server/static/index.js,sha256=4MEBiTUm4u7QrSnPE7OJrBEYSkFjmyZPugfrowtQOCI,3259882
+phoenix/server/static/index.js,sha256=tbeJsyK4L19pFLbl2H4eBCk1JpTQWa8f5m_YJoRXOG4,3140434
 phoenix/server/static/modernizr.js,sha256=mvK-XtkNqjOral-QvzoqsyOMECXIMu5BQwSVN_wcU9c,2564
 phoenix/server/templates/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 phoenix/server/templates/index.html,sha256=DlfcGoq1V5C2QkJWqP1j4Nu6_kPfsOzOrtzYF3ogghE,1900
 phoenix/session/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 phoenix/session/evaluation.py,sha256=DaAtA0XYJbXRJO_StGywa-9APlz2ORSmCXzxrtn3rvI,4997
-phoenix/session/session.py,sha256=94hRilOwlEWo6npLNjutaYRCevDPLPnAQdnuP07qeGc,20826
+phoenix/session/session.py,sha256=1kwqPHPyzdlsAsyQ4ZKlvdE0rnt1K8TWtB3KzbtOyP4,20862
 phoenix/trace/__init__.py,sha256=4d_MqzUIFmlY9WWcFeTONJ4xL5mPGoWZaPM2TJ0ZDBQ,266
 phoenix/trace/errors.py,sha256=DbXSJnNErV7305tKv7pUWLD6jcVHJ6EBdSu4mZJ6IM4,112
 phoenix/trace/evaluation_conventions.py,sha256=t8jydM3U0-T5YpiQKRJ3tWdWGlHtzKyttYdw-ddvPOk,1048
-phoenix/trace/exporter.py,sha256=z3xrGJhIRh7XMy4Q1FkR3KmFZym-GX0XxLTZ6eSnN0Q,4347
-phoenix/trace/fixtures.py,sha256=GGNOVi8Cjj9eduxOenyYLF8mhl-XTbXHtnraP5vLlxQ,6341
-phoenix/trace/otel.py,sha256=Efc6S0IuvI-NEJ_Mv1VWEzQS94-lR_6nJ3ecTzwmyQ4,13933
-phoenix/trace/schemas.py,sha256=m1wVlYFT6qL3FovD3TtTYsEgN6OHvv52gNdJkoPCmuY,5400
+phoenix/trace/exporter.py,sha256=jH8jp1Ikt6BmZGElpTG1F3b0yYDm9WSWLFpxHnKiMtY,4409
+phoenix/trace/fixtures.py,sha256=LokNedhbGYxpzXznteO4m5QehvNYjzvoh231-CMJQeY,7113
+phoenix/trace/otel.py,sha256=9oum5RPCsEZvKg41mEy8aKDcXHBwtR-P9eeqEXp-ts4,14642
+phoenix/trace/schemas.py,sha256=fYrhC0sTlw6vilsQexSmyhvifnT7SajMxWLMAQTxv4E,5398
 phoenix/trace/semantic_conventions.py,sha256=u6NG85ZhbreriZr8cqJaddldM_jUcew7JilszY7JUk8,4652
 phoenix/trace/span_evaluations.py,sha256=asGug9lUHUufBwK1nL_PnHIDKsOc5X4ws7cur9lfoyI,12421
-phoenix/trace/span_json_decoder.py,sha256=Xv-0uCsHgwzQb0dqTa7CuuDeXAPaXjQICyCFK3ZQaSs,3089
+phoenix/trace/span_json_decoder.py,sha256=nrIPkcgbCcNML-0OSjWC6fxIfBEMiP0n67yM_m-vegg,3068
 phoenix/trace/span_json_encoder.py,sha256=C5y7rkyOcV08oJC5t8TZqVxsKCZMJKad7bBQzAgLoDs,1763
 phoenix/trace/trace_dataset.py,sha256=KW0TzmhlKuX8PUPLV172iTK08myYE0QXUC75KiIqJ7k,13204
-phoenix/trace/tracer.py,sha256=S8UfhI4Qhl_uulD9bj9qFdSB5vwcB42hXd8-qURGcmo,3662
+phoenix/trace/tracer.py,sha256=AoYyWRco-EcvK7TASmZO0z-nJEm3cXlG9lhTWDTz4VU,3691
 phoenix/trace/utils.py,sha256=7LurVGXn245cjj4MJsc7v6jq4DSJkpK6YGBfIaSywuw,1307
 phoenix/trace/dsl/__init__.py,sha256=WIQIjJg362XD3s50OsPJJ0xbDsGp41bSv7vDllLrPuA,144
 phoenix/trace/dsl/filter.py,sha256=2vHtKAvq8OAFlXNDE4qxPEEUpda39tC8xy0gDK9SN4I,12696
@@ -155,9 +155,9 @@ phoenix/trace/langchain/__init__.py,sha256=vAjrmrreetV7L5IL8VH_9efG9VJunJTgT0iKy
 phoenix/trace/langchain/instrumentor.py,sha256=HkNKbFNclTYjRXBM8qU4qvZHdyw06J9bhwgE7JnqbNI,1323
 phoenix/trace/langchain/tracer.py,sha256=1Oz3orSDpZX1pZKwtZbeM_f9tiAhQb7Of8ARjRlKVQY,16827
 phoenix/trace/llama_index/__init__.py,sha256=wCcQgD9CG5TA8i-1XsSed4ZzwHTUmqZwegQAV_FqEng,178
-phoenix/trace/llama_index/callback.py,sha256=ARi33dYQtBsY3_h9eE5ZLoM7OXQfYtoZ1--571zILgg,27570
+phoenix/trace/llama_index/callback.py,sha256=cSa5whoaMDdBc7W2QSWWatMoNL-wKU2fozkP8prpUMQ,27563
 phoenix/trace/llama_index/debug_callback.py,sha256=SKToD9q_QADSGTJ5lhilqRVKaUnUSRXUvURCzN4by2U,1367
-phoenix/trace/llama_index/streaming.py,sha256=5cTtr8evvcEAB88Xb4ih3WEw0xAF4x5W9PehUX9l5_0,3258
+phoenix/trace/llama_index/streaming.py,sha256=yt_kB0LJK6lGdARtivmEmkZgbnzFUqIHfSN0hjYbTpM,3248
 phoenix/trace/openai/__init__.py,sha256=J3G0uqCxGdksUpaQVHds_Egv2drvh8UEqoLjiQAOveg,79
 phoenix/trace/openai/instrumentor.py,sha256=H1T2_1uqeH2lKCKeMmirEUl6PRtHQlQTXfsLR_hwDFM,24948
 phoenix/trace/v1/__init__.py,sha256=-IbAD0ruESMjvQLvGAg9CTfjBUATFDx1OXseDPis6-0,88
@@ -166,8 +166,8 @@ phoenix/trace/v1/evaluation_pb2.pyi,sha256=cCbbx06gwQmaH14s3J1X25TtaARh-k1abbxQd
 phoenix/utilities/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 phoenix/utilities/error_handling.py,sha256=7b5rpGFj9EWZ8yrZK1IHvxB89suWk3lggDayUQcvZds,1946
 phoenix/utilities/logging.py,sha256=lDXd6EGaamBNcQxL4vP1au9-i_SXe0OraUDiJOcszSw,222
-arize_phoenix-2.7.0.dist-info/METADATA,sha256=G2XhPSpRh7gJHrTc5_MhOvrpFBTWv0_mjb_mZueDuWI,26479
-arize_phoenix-2.7.0.dist-info/WHEEL,sha256=mRYSEL3Ih6g5a_CVMIcwiF__0Ae4_gLYh01YFNwiq1k,87
-arize_phoenix-2.7.0.dist-info/licenses/IP_NOTICE,sha256=JBqyyCYYxGDfzQ0TtsQgjts41IJoa-hiwDrBjCb9gHM,469
-arize_phoenix-2.7.0.dist-info/licenses/LICENSE,sha256=HFkW9REuMOkvKRACuwLPT0hRydHb3zNg-fdFt94td18,3794
-arize_phoenix-2.7.0.dist-info/RECORD,,
+arize_phoenix-2.8.0.dist-info/METADATA,sha256=-anApFNW1PtrZU7EHCCg8JD-LDLfFQJZv9mUQfobfnE,26703
+arize_phoenix-2.8.0.dist-info/WHEEL,sha256=TJPnKdtrSue7xZ_AVGkp9YXcvDrobsjBds1du3Nx6dc,87
+arize_phoenix-2.8.0.dist-info/licenses/IP_NOTICE,sha256=JBqyyCYYxGDfzQ0TtsQgjts41IJoa-hiwDrBjCb9gHM,469
+arize_phoenix-2.8.0.dist-info/licenses/LICENSE,sha256=HFkW9REuMOkvKRACuwLPT0hRydHb3zNg-fdFt94td18,3794
+arize_phoenix-2.8.0.dist-info/RECORD,,

{arize_phoenix-2.7.0.dist-info → arize_phoenix-2.8.0.dist-info}/WHEEL RENAMED Viewed

@@ -1,4 +1,4 @@
 Wheel-Version: 1.0
-Generator: hatchling 1.21.0
+Generator: hatchling 1.21.1
 Root-Is-Purelib: true
 Tag: py3-none-any

phoenix/exceptions.py CHANGED Viewed

@@ -1,2 +1,6 @@
 class PhoenixException(Exception):
     pass
+class PhoenixContextLimitExceeded(PhoenixException):
+    pass

phoenix/experimental/evals/functions/classify.py CHANGED Viewed

@@ -249,7 +249,7 @@ def run_relevance_eval(
         This latter format is intended for running evaluations on exported OpenInference trace
         dataframes. For more information on the OpenInference tracing specification, see
-        https://github.com/Arize-ai/open-inference-spec/.
+        https://github.com/Arize-ai/openinference/.
         model (BaseEvalModel): The model used for evaluation.

phoenix/experimental/evals/models/anthropic.py CHANGED Viewed

@@ -1,6 +1,7 @@
 from dataclasses import dataclass, field
 from typing import TYPE_CHECKING, Any, Dict, List, Optional
+from phoenix.exceptions import PhoenixContextLimitExceeded
 from phoenix.experimental.evals.models.base import BaseEvalModel
 from phoenix.experimental.evals.models.rate_limiters import RateLimiter
@@ -44,12 +45,6 @@ class AnthropicModel(BaseEvalModel):
         self._init_client()
         self._init_tiktoken()
         self._init_rate_limiter()
-        self.retry = self._retry(
-            error_types=[],  # default to catching all errors
-            min_seconds=self.retry_min_seconds,
-            max_seconds=self.retry_max_seconds,
-            max_retries=self.max_retries,
-        )
     def _init_environment(self) -> None:
         try:
@@ -127,7 +122,7 @@ class AnthropicModel(BaseEvalModel):
         kwargs.pop("instruction", None)
         invocation_parameters = self.invocation_parameters()
         invocation_parameters.update(kwargs)
-        response = self._generate_with_retry(
+        response = self._rate_limited_completion(
             model=self.model,
             prompt=self._format_prompt_for_claude(prompt),
             **invocation_parameters,
@@ -135,14 +130,19 @@ class AnthropicModel(BaseEvalModel):
         return str(response)
-    def _generate_with_retry(self, **kwargs: Any) -> Any:
-        @self.retry
+    def _rate_limited_completion(self, **kwargs: Any) -> Any:
         @self._rate_limiter.limit
-        def _completion_with_retry(**kwargs: Any) -> Any:
-            response = self.client.completions.create(**kwargs)
-            return response.completion
-        return _completion_with_retry(**kwargs)
+        def _completion(**kwargs: Any) -> Any:
+            try:
+                response = self.client.completions.create(**kwargs)
+                return response.completion
+            except self._anthropic.BadRequestError as e:
+                exception_message = e.args[0]
+                if exception_message and "prompt is too long" in exception_message:
+                    raise PhoenixContextLimitExceeded(exception_message) from e
+                raise e
+        return _completion(**kwargs)
     async def _async_generate(self, prompt: str, **kwargs: Dict[str, Any]) -> str:
         # instruction is an invalid input to Anthropic models, it is passed in by
@@ -150,20 +150,25 @@ class AnthropicModel(BaseEvalModel):
         kwargs.pop("instruction", None)
         invocation_parameters = self.invocation_parameters()
         invocation_parameters.update(kwargs)
-        response = await self._async_generate_with_retry(
+        response = await self._async_rate_limited_completion(
             model=self.model, prompt=self._format_prompt_for_claude(prompt), **invocation_parameters
         )
         return str(response)
-    async def _async_generate_with_retry(self, **kwargs: Any) -> Any:
-        @self.retry
+    async def _async_rate_limited_completion(self, **kwargs: Any) -> Any:
         @self._rate_limiter.alimit
-        async def _async_completion_with_retry(**kwargs: Any) -> Any:
-            response = await self.async_client.completions.create(**kwargs)
-            return response.completion
-        return await _async_completion_with_retry(**kwargs)
+        async def _async_completion(**kwargs: Any) -> Any:
+            try:
+                response = await self.async_client.completions.create(**kwargs)
+                return response.completion
+            except self._anthropic.BadRequestError as e:
+                exception_message = e.args[0]
+                if exception_message and "prompt is too long" in exception_message:
+                    raise PhoenixContextLimitExceeded(exception_message) from e
+                raise e
+        return await _async_completion(**kwargs)
     def _format_prompt_for_claude(self, prompt: str) -> str:
         # Claude requires prompt in the format of Human: ... Assistant:

phoenix/experimental/evals/models/base.py CHANGED Viewed

@@ -2,22 +2,13 @@ import logging
 from abc import ABC, abstractmethod, abstractproperty
 from contextlib import contextmanager
 from dataclasses import dataclass, field
-from typing import TYPE_CHECKING, Any, Callable, Generator, List, Optional, Sequence, Type
+from typing import TYPE_CHECKING, Any, Generator, List, Optional, Sequence
 from phoenix.experimental.evals.models.rate_limiters import RateLimiter
 if TYPE_CHECKING:
     from tiktoken import Encoding
-from tenacity import (
-    RetryCallState,
-    retry,
-    retry_base,
-    retry_if_exception_type,
-    stop_after_attempt,
-    wait_random_exponential,
-)
 from tqdm.asyncio import tqdm_asyncio
 from tqdm.auto import tqdm
 from typing_extensions import TypeVar
@@ -65,52 +56,6 @@ class BaseEvalModel(ABC):
     def reload_client(self) -> None:
         pass
-    def _retry(
-        self,
-        error_types: List[Type[BaseException]],
-        min_seconds: int,
-        max_seconds: int,
-        max_retries: int,
-    ) -> Callable[[Any], Any]:
-        """Create a retry decorator for a given LLM and provided list of error types."""
-        def log_retry(retry_state: RetryCallState) -> None:
-            if fut := retry_state.outcome:
-                exc = fut.exception()
-            else:
-                exc = None
-            if exc:
-                printif(
-                    self._verbose,
-                    (
-                        f"Failed attempt {retry_state.attempt_number}: "
-                        f"{type(exc).__module__}.{type(exc).__name__}"
-                    ),
-                )
-                printif(
-                    True,
-                    f"Failed attempt {retry_state.attempt_number}: raised {repr(exc)}",
-                )
-            else:
-                printif(True, f"Failed attempt {retry_state.attempt_number}")
-            return None
-        if not error_types:
-            # default to retrying on all exceptions
-            error_types = [Exception]
-        retry_instance: retry_base = retry_if_exception_type(error_types[0])
-        for error in error_types[1:]:
-            retry_instance = retry_instance | retry_if_exception_type(error)
-        return retry(
-            reraise=True,
-            stop=stop_after_attempt(max_retries),
-            wait=wait_random_exponential(multiplier=1, min=min_seconds, max=max_seconds),
-            retry=retry_instance,
-            before_sleep=log_retry,
-        )
     def __call__(self, prompt: str, instruction: Optional[str] = None, **kwargs: Any) -> str:
         """Run the LLM on the given prompt."""
         if not isinstance(prompt, str):

phoenix/experimental/evals/models/bedrock.py CHANGED Viewed

@@ -3,6 +3,7 @@ import logging
 from dataclasses import dataclass, field
 from typing import TYPE_CHECKING, Any, Dict, List, Optional
+from phoenix.exceptions import PhoenixContextLimitExceeded
 from phoenix.experimental.evals.models.base import BaseEvalModel
 from phoenix.experimental.evals.models.rate_limiters import RateLimiter
@@ -54,12 +55,6 @@ class BedrockModel(BaseEvalModel):
         self._init_client()
         self._init_tiktoken()
         self._init_rate_limiter()
-        self.retry = self._retry(
-            error_types=[],  # default to catching all errors
-            min_seconds=self.retry_min_seconds,
-            max_seconds=self.retry_max_seconds,
-            max_retries=self.max_retries,
-        )
     def _init_environment(self) -> None:
         try:
@@ -130,21 +125,36 @@ class BedrockModel(BaseEvalModel):
         accept = "application/json"
         contentType = "application/json"
-        response = self._generate_with_retry(
+        response = self._rate_limited_completion(
             body=body, modelId=self.model_id, accept=accept, contentType=contentType
         )
         return self._parse_output(response) or ""
-    def _generate_with_retry(self, **kwargs: Any) -> Any:
+    def _rate_limited_completion(self, **kwargs: Any) -> Any:
         """Use tenacity to retry the completion call."""
-        @self.retry
         @self._rate_limiter.limit
-        def _completion_with_retry(**kwargs: Any) -> Any:
-            return self.client.invoke_model(**kwargs)
-        return _completion_with_retry(**kwargs)
+        def _completion(**kwargs: Any) -> Any:
+            try:
+                return self.client.invoke_model(**kwargs)
+            except Exception as e:
+                exception_message = e.args[0]
+                if not exception_message:
+                    raise e
+                if "Input is too long" in exception_message:
+                    # Error from Anthropic models
+                    raise PhoenixContextLimitExceeded(exception_message) from e
+                elif "expected maxLength" in exception_message:
+                    # Error from Titan models
+                    raise PhoenixContextLimitExceeded(exception_message) from e
+                elif "Prompt has too many tokens" in exception_message:
+                    # Error from AI21 models
+                    raise PhoenixContextLimitExceeded(exception_message) from e
+                raise e
+        return _completion(**kwargs)
     def _format_prompt_for_claude(self, prompt: str) -> str:
         # Claude requires prompt in the format of Human: ... Assisatnt:

phoenix/experimental/evals/models/litellm.py CHANGED Viewed

@@ -95,24 +95,17 @@ class LiteLLMModel(BaseEvalModel):
     def _generate(self, prompt: str, **kwargs: Dict[str, Any]) -> str:
         messages = self._get_messages_from_prompt(prompt)
-        return str(
-            self._generate_with_retry(
-                model=self.model_name,
-                messages=messages,
-                temperature=self.temperature,
-                max_tokens=self.max_tokens,
-                top_p=self.top_p,
-                num_retries=self.num_retries,
-                request_timeout=self.request_timeout,
-                **self.model_kwargs,
-            )
+        response = self._litellm.completion(
+            model=self.model_name,
+            messages=messages,
+            temperature=self.temperature,
+            max_tokens=self.max_tokens,
+            top_p=self.top_p,
+            num_retries=self.num_retries,
+            request_timeout=self.request_timeout,
+            **self.model_kwargs,
         )
-    def _generate_with_retry(self, **kwargs: Any) -> Any:
-        # Using default LiteLLM completion with retries = self.num_retries.
-        response = self._litellm.completion(**kwargs)
-        return response.choices[0].message.content
+        return str(response.choices[0].message.content)
     def _get_messages_from_prompt(self, prompt: str) -> List[Dict[str, str]]:
         # LiteLLM requires prompts in the format of messages

phoenix/experimental/evals/models/openai.py CHANGED Viewed

@@ -14,6 +14,7 @@ from typing import (
     get_origin,
 )
+from phoenix.exceptions import PhoenixContextLimitExceeded
 from phoenix.experimental.evals.models.base import BaseEvalModel
 from phoenix.experimental.evals.models.rate_limiters import RateLimiter
@@ -114,25 +115,11 @@ class OpenAIModel(BaseEvalModel):
     def _init_environment(self) -> None:
         try:
-            import httpx
             import openai
             import openai._utils as openai_util
             self._openai = openai
             self._openai_util = openai_util
-            self._openai_retry_errors = [
-                self._openai.APITimeoutError,
-                self._openai.APIError,
-                self._openai.APIConnectionError,
-                self._openai.InternalServerError,
-                httpx.ReadTimeout,
-            ]
-            self.retry = self._retry(
-                error_types=self._openai_retry_errors,
-                min_seconds=self.retry_min_seconds,
-                max_seconds=self.retry_max_seconds,
-                max_retries=self.max_retries,
-            )
         except ImportError:
             self._raise_import_error(
                 package_display_name="OpenAI",
@@ -265,7 +252,7 @@ class OpenAIModel(BaseEvalModel):
             invoke_params["functions"] = functions
         if function_call := kwargs.get("function_call"):
             invoke_params["function_call"] = function_call
-        response = await self._async_generate_with_retry(
+        response = await self._async_rate_limited_completion(
             messages=messages,
             **invoke_params,
         )
@@ -284,7 +271,7 @@ class OpenAIModel(BaseEvalModel):
             invoke_params["functions"] = functions
         if function_call := kwargs.get("function_call"):
             invoke_params["function_call"] = function_call
-        response = self._generate_with_retry(
+        response = self._rate_limited_completion(
             messages=messages,
             **invoke_params,
         )
@@ -296,45 +283,51 @@ class OpenAIModel(BaseEvalModel):
             return str(function_call.get("arguments") or "")
         return str(message["content"])
-    async def _async_generate_with_retry(self, **kwargs: Any) -> Any:
-        """Use tenacity to retry the completion call."""
-        @self.retry
+    async def _async_rate_limited_completion(self, **kwargs: Any) -> Any:
         @self._rate_limiter.alimit
-        async def _completion_with_retry(**kwargs: Any) -> Any:
-            if self._model_uses_legacy_completion_api:
-                if "prompt" not in kwargs:
-                    kwargs["prompt"] = "\n\n".join(
-                        (message.get("content") or "")
-                        for message in (kwargs.pop("messages", None) or ())
-                    )
-                # OpenAI 1.0.0 API responses are pydantic objects, not dicts
-                # We must dump the model to get the dict
-                res = await self._async_client.completions.create(**kwargs)
-            else:
-                res = await self._async_client.chat.completions.create(**kwargs)
-            return res.model_dump()
-        return await _completion_with_retry(**kwargs)
-    def _generate_with_retry(self, **kwargs: Any) -> Any:
-        """Use tenacity to retry the completion call."""
-        @self.retry
+        async def _async_completion(**kwargs: Any) -> Any:
+            try:
+                if self._model_uses_legacy_completion_api:
+                    if "prompt" not in kwargs:
+                        kwargs["prompt"] = "\n\n".join(
+                            (message.get("content") or "")
+                            for message in (kwargs.pop("messages", None) or ())
+                        )
+                    # OpenAI 1.0.0 API responses are pydantic objects, not dicts
+                    # We must dump the model to get the dict
+                    res = await self._async_client.completions.create(**kwargs)
+                else:
+                    res = await self._async_client.chat.completions.create(**kwargs)
+                return res.model_dump()
+            except self._openai._exceptions.BadRequestError as e:
+                exception_message = e.args[0]
+                if exception_message and "maximum context length" in exception_message:
+                    raise PhoenixContextLimitExceeded(exception_message) from e
+                raise e
+        return await _async_completion(**kwargs)
+    def _rate_limited_completion(self, **kwargs: Any) -> Any:
         @self._rate_limiter.limit
-        def _completion_with_retry(**kwargs: Any) -> Any:
-            if self._model_uses_legacy_completion_api:
-                if "prompt" not in kwargs:
-                    kwargs["prompt"] = "\n\n".join(
-                        (message.get("content") or "")
-                        for message in (kwargs.pop("messages", None) or ())
-                    )
-                # OpenAI 1.0.0 API responses are pydantic objects, not dicts
-                # We must dump the model to get the dict
-                return self._client.completions.create(**kwargs).model_dump()
-            return self._client.chat.completions.create(**kwargs).model_dump()
-        return _completion_with_retry(**kwargs)
+        def _completion(**kwargs: Any) -> Any:
+            try:
+                if self._model_uses_legacy_completion_api:
+                    if "prompt" not in kwargs:
+                        kwargs["prompt"] = "\n\n".join(
+                            (message.get("content") or "")
+                            for message in (kwargs.pop("messages", None) or ())
+                        )
+                    # OpenAI 1.0.0 API responses are pydantic objects, not dicts
+                    # We must dump the model to get the dict
+                    return self._client.completions.create(**kwargs).model_dump()
+                return self._client.chat.completions.create(**kwargs).model_dump()
+            except self._openai._exceptions.BadRequestError as e:
+                exception_message = e.args[0]
+                if exception_message and "maximum context length" in exception_message:
+                    raise PhoenixContextLimitExceeded(exception_message) from e
+                raise e
+        return _completion(**kwargs)
     @property
     def max_context_size(self) -> int:

arize-phoenix 2.7.0__py3-none-any.whl → 2.8.0__py3-none-any.whl

Potentially problematic release.

arize-phoenix 2.7.0py3-none-any.whl → 2.8.0py3-none-any.whl