copilot-api-plus 1.2.15 → 1.2.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.en.md CHANGED
@@ -49,7 +49,7 @@ English | [简体中文](README.md)
49
49
  | 🛡️ **Network Resilience** | 120s timeout + smart retry + instant stream recovery + proxy tunnel keepalive (45s heartbeat) |
50
50
  | ✂️ **Context Passthrough** | Full context passthrough to upstream API; clients (e.g. Claude Code) manage compression |
51
51
  | 🔍 **Smart Model Matching** | Handles model name format differences (date suffixes, dash/dot versions, etc.) |
52
- | 🧠 **Thinking Chain** | Automatically enables deep thinking (reasoning_effort) with Anthropic adaptive/enabled mode auto-translation |
52
+ | 🧠 **Thinking Chain** | Automatically enables deep thinking for supported models, improving code quality |
53
53
 
54
54
  ---
55
55
 
@@ -583,12 +583,12 @@ Each API request outputs a log line with model name, status code, and duration:
583
583
  Built-in connection timeout and smart retry for upstream API requests, minimizing Copilot request credit consumption:
584
584
 
585
585
  - **Connection timeout**: 120 seconds for the first attempt, 30 seconds for retries (headers typically arrive in 3–5s)
586
- - **Retry strategy**: Up to 2 retries (3 total attempts), 2-3 second delays
586
+ - **Retry strategy**: Up to 1 retry (2 total attempts), 2-second delay. **Timeout errors are never retried** — a timeout means the request likely reached Copilot and a credit was already consumed
587
587
  - **Instant stream recovery**: On SSE stream interruption, immediately destroys the connection pool so the next request uses fresh sockets — recovery drops from ~135s to seconds
588
588
  - **Connection pool reset**: Automatically destroys all pooled connections on the first network error and creates fresh instances, preventing retries from hitting stale sockets
589
589
  - **Proxy tunnel keepalive**: Sends lightweight heartbeat requests every 45s while SSE streams are active, preventing proxy nodes from killing CONNECT tunnels due to inactivity
590
590
  - **HTTP/2 support**: Enables HTTP/2 protocol for better multiplexing performance
591
- - Only retries network-layer errors (timeout, TLS disconnect, connection reset, etc.); HTTP error codes (e.g. 400/500) are not retried
591
+ - Only retries network-layer connection errors (TLS disconnect, connection reset, etc.); timeout and HTTP error codes (e.g. 400/500) are not retried
592
592
  - SSE stream interruptions gracefully send error events to the client
593
593
 
594
594
  ---
package/README.md CHANGED
@@ -50,7 +50,7 @@
50
50
  | 🛡️ **网络弹性** | 120s 连接超时 + 智能重试 + 流中断即时恢复 + 代理隧道保活(45s 心跳) |
51
51
  | ✂️ **上下文透传** | 全量透传上下文至上游 API,由客户端(如 Claude Code)自行管理压缩 |
52
52
  | 🔍 **智能模型匹配** | 自动处理模型名格式差异(日期后缀、dash/dot 版本号等) |
53
- | 🧠 **Thinking 思维链** | 自动为支持的模型启用深度思考(reasoning_effort),支持 Anthropic adaptive/enabled 模式自动转换 |
53
+ | 🧠 **Thinking 思维链** | 自动为支持的模型启用深度思考,提升代码质量 |
54
54
 
55
55
  ---
56
56
 
@@ -746,12 +746,12 @@ Anthropic 格式的模型名(如 `claude-opus-4-6`)和 Copilot 的模型列
746
746
  对上游 API 的请求内置了连接超时和智能重试,以最小化 Copilot 请求次数消耗:
747
747
 
748
748
  - **连接超时**:首次请求 120 秒,重试请求 30 秒(响应头通常 3~5 秒到达)
749
- - **重试策略**:最多重试 2 次(共 3 次尝试),间隔 2-3
749
+ - **重试策略**:最多重试 1 次(共 2 次尝试),间隔 2 秒,**超时错误不重试**(超时意味着请求已到达 Copilot,额度已消耗)
750
750
  - **流中断即时恢复**:SSE 流中断时立刻销毁连接池,下一个请求使用全新连接,恢复时间从 ~135 秒降至几秒
751
751
  - **连接池重置**:首次网络错误后自动销毁所有连接并创建新实例,避免后续请求复用坏连接
752
752
  - **代理隧道保活**:SSE 流传输期间每 45 秒发送一次轻量心跳请求,防止代理节点因空闲而杀断 CONNECT 隧道
753
753
  - **HTTP/2 支持**:启用 HTTP/2 协议,提升多路复用性能
754
- - 仅重试网络层错误(超时、TLS 断开、连接重置等),HTTP 错误码(如 400/500)不重试
754
+ - 仅重试网络层连接错误(TLS 断开、连接重置等),超时和 HTTP 错误码(如 400/500)不重试
755
755
  - SSE 流传输中断时,优雅地向客户端发送错误事件
756
756
 
757
757
  ---
package/dist/main.js CHANGED
@@ -1768,12 +1768,14 @@ async function checkRateLimit(state) {
1768
1768
  const FETCH_TIMEOUT_MS = 12e4;
1769
1769
  /**
1770
1770
  * Retry delays in ms. After the first failure the connection pool is reset
1771
- * (see `resetConnections`), so retries use fresh sockets. We allow up to
1772
- * 2 retries because SSE streams through HTTP proxies are frequently
1773
- * interrupted during long model thinking phases (~60 s idle timeout on
1774
- * many proxy nodes). Keeping the delay short avoids wasting wall-clock time.
1771
+ * (see `resetConnections`), so retries use fresh sockets.
1772
+ *
1773
+ * We only allow 1 retry (2 total attempts) to minimize credit waste.
1774
+ * Timeout errors are NOT retried at all they indicate the request likely
1775
+ * reached Copilot (consuming a credit) and the upstream is slow or the
1776
+ * proxy killed the connection mid-flight.
1775
1777
  */
1776
- const RETRY_DELAYS = [2e3, 3e3];
1778
+ const RETRY_DELAYS = [2e3];
1777
1779
  /**
1778
1780
  * Timeout for retry attempts (waiting for response headers only).
1779
1781
  * Response headers typically arrive within 3–5 s, even on slow models.
@@ -1820,6 +1822,11 @@ async function fetchWithRetry(url, buildInit) {
1820
1822
  return await fetchWithTimeout(url, buildInit(), timeout);
1821
1823
  } catch (error) {
1822
1824
  lastError = error;
1825
+ const msg = error instanceof Error ? error.message : String(error);
1826
+ if (msg.includes("timed out")) {
1827
+ consola.warn(`Request timed out on attempt ${attempt + 1}/${maxAttempts} — not retrying (credit likely consumed):`, msg);
1828
+ break;
1829
+ }
1823
1830
  if (attempt === 0) resetConnections();
1824
1831
  if (attempt < maxAttempts - 1) {
1825
1832
  const delay = RETRY_DELAYS[attempt];
@@ -1856,6 +1863,13 @@ async function* wrapGeneratorWithRelease(gen, releaseSlot) {
1856
1863
  */
1857
1864
  const reasoningUnsupportedModels = /* @__PURE__ */ new Set();
1858
1865
  /**
1866
+ * Models whose reasoning_effort must be capped at a lower level.
1867
+ * e.g. claude-opus-4.7 rejects "high" but accepts "medium".
1868
+ * When a model returns 400 with "is not supported by model", it is added
1869
+ * here with its maximum supported effort level.
1870
+ */
1871
+ const reasoningEffortCap = /* @__PURE__ */ new Map();
1872
+ /**
1859
1873
  * Compute an appropriate thinking_budget from model capabilities.
1860
1874
  * Returns undefined if the model does not support thinking.
1861
1875
  */
@@ -1885,7 +1899,9 @@ function isToolChoiceForced(toolChoice) {
1885
1899
  * 1. If the client already set reasoning_effort or thinking_budget → keep as-is
1886
1900
  * 2. If tool_choice forces tool use → skip (API rejects the combination)
1887
1901
  * 3. If model capabilities declare max_thinking_budget → inject thinking_budget
1888
- * 4. Otherwise → inject reasoning_effort="high" (works on claude-*-4.6)
1902
+ * 4. Otherwise → inject reasoning_effort at the highest level the model supports:
1903
+ * - "high" by default (maximum thinking for most models)
1904
+ * - Capped to "medium"/"low" if the model previously rejected "high"
1889
1905
  *
1890
1906
  * The fallback to reasoning_effort ensures thinking works even when the
1891
1907
  * /models endpoint doesn't expose thinking budget fields.
@@ -1898,16 +1914,17 @@ function injectThinking(payload, resolvedModel) {
1898
1914
  ...payload,
1899
1915
  thinking_budget: budget
1900
1916
  };
1901
- if (!reasoningUnsupportedModels.has(resolvedModel)) return {
1917
+ if (reasoningUnsupportedModels.has(resolvedModel)) return payload;
1918
+ const effort = reasoningEffortCap.get(resolvedModel) ?? "high";
1919
+ return {
1902
1920
  ...payload,
1903
- reasoning_effort: "high"
1921
+ reasoning_effort: effort
1904
1922
  };
1905
- return payload;
1906
1923
  }
1907
1924
  function logThinkingInjection(original, injected, resolvedModel) {
1908
- if (original.reasoning_effort || original.thinking_budget) consola.debug(`Thinking: client-specified (reasoning_effort=${original.reasoning_effort ?? "none"} / thinking_budget=${original.thinking_budget ?? "none"})`);
1925
+ if (original.reasoning_effort || original.thinking_budget) consola.debug(`Thinking: translated (reasoning_effort=${original.reasoning_effort ?? "none"} / thinking_budget=${original.thinking_budget ?? "none"})`);
1909
1926
  else if (injected.thinking_budget && injected.thinking_budget !== original.thinking_budget) consola.debug(`Thinking: injected thinking_budget=${injected.thinking_budget} for "${resolvedModel}"`);
1910
- else if (injected.reasoning_effort === "high") consola.debug(`Thinking: injected reasoning_effort=high for "${resolvedModel}"`);
1927
+ else if (injected.reasoning_effort && injected.reasoning_effort !== original.reasoning_effort) consola.debug(`Thinking: injected reasoning_effort=${injected.reasoning_effort} for "${resolvedModel}"`);
1911
1928
  else if (reasoningUnsupportedModels.has(resolvedModel)) consola.debug(`Thinking: skipped — "${resolvedModel}" does not support reasoning`);
1912
1929
  }
1913
1930
  const createChatCompletions = async (payload) => {
@@ -1927,10 +1944,24 @@ const createChatCompletions = async (payload) => {
1927
1944
  releaseSlot();
1928
1945
  return result;
1929
1946
  } catch (error) {
1930
- if (wasInjected && error instanceof HTTPError && error.response.status === 400 && error.message.includes("Unrecognized request argument")) {
1931
- reasoningUnsupportedModels.add(resolvedModel);
1932
- consola.info(`Model "${resolvedModel}" does not support reasoning_effort — disabled for future requests`);
1933
- return retryWithoutReasoning(routedPayload, releaseSlot);
1947
+ if (error instanceof HTTPError && error.response.status === 400) {
1948
+ const errMsg = error.message;
1949
+ if (wasInjected && errMsg.includes("Unrecognized request argument")) {
1950
+ reasoningUnsupportedModels.add(resolvedModel);
1951
+ consola.info(`Model "${resolvedModel}" does not support reasoning_effort — disabled for future requests`);
1952
+ return retryWithoutReasoning(routedPayload, releaseSlot);
1953
+ }
1954
+ if (errMsg.includes("is not supported by model")) {
1955
+ const currentEffort = thinkingPayload.reasoning_effort;
1956
+ if (currentEffort && currentEffort !== "medium" && currentEffort !== "low") {
1957
+ reasoningEffortCap.set(resolvedModel, "medium");
1958
+ consola.info(`Model "${resolvedModel}" rejected reasoning_effort="${currentEffort}" — downgrading to "medium" for future requests`);
1959
+ return retryWithDowngradedReasoning({
1960
+ ...routedPayload,
1961
+ reasoning_effort: "medium"
1962
+ }, releaseSlot);
1963
+ }
1964
+ }
1934
1965
  }
1935
1966
  releaseSlot();
1936
1967
  throw error;
@@ -1952,6 +1983,21 @@ async function retryWithoutReasoning(payload, releaseSlot) {
1952
1983
  }
1953
1984
  }
1954
1985
  /**
1986
+ * Retry a request with a downgraded reasoning_effort after the model
1987
+ * rejected the higher value (e.g. "high" → "medium").
1988
+ */
1989
+ async function retryWithDowngradedReasoning(payload, releaseSlot) {
1990
+ try {
1991
+ const result = await dispatchRequest(payload);
1992
+ if (Symbol.asyncIterator in result) return wrapGeneratorWithRelease(result, releaseSlot);
1993
+ releaseSlot();
1994
+ return result;
1995
+ } catch (retryError) {
1996
+ releaseSlot();
1997
+ throw retryError;
1998
+ }
1999
+ }
2000
+ /**
1955
2001
  * Dispatch request to either single-account or multi-account path.
1956
2002
  */
1957
2003
  function dispatchRequest(payload) {