@browserbasehq/orca 3.0.0-test.1 → 3.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,164 @@
1
+ <div id="toc" align="center" style="margin-bottom: 0;">
2
+ <ul style="list-style: none; margin: 0; padding: 0;">
3
+ <a href="https://stagehand.dev">
4
+ <picture>
5
+ <source media="(prefers-color-scheme: dark)" srcset="../../media/dark_logo.png" />
6
+ <img alt="Stagehand" src="../../media/light_logo.png" width="200" style="margin-right: 30px;" />
7
+ </picture>
8
+ </a>
9
+ </ul>
10
+ </div>
11
+ <p align="center">
12
+ <strong>The AI Browser Automation Framework</strong><br>
13
+ <a href="https://docs.stagehand.dev">Read the Docs</a>
14
+ </p>
15
+
16
+ <p align="center">
17
+ <a href="https://github.com/browserbase/stagehand/tree/main?tab=MIT-1-ov-file#MIT-1-ov-file">
18
+ <picture>
19
+ <source media="(prefers-color-scheme: dark)" srcset="../../media/dark_license.svg" />
20
+ <img alt="MIT License" src="../../media/light_license.svg" />
21
+ </picture>
22
+ </a>
23
+ <a href="https://join.slack.com/t/stagehand-dev/shared_invite/zt-3hgv6bwqu-s7MXXgPd7_rD53aViGo1MQ">
24
+ <picture>
25
+ <source media="(prefers-color-scheme: dark)" srcset="../../media/dark_slack.svg" />
26
+ <img alt="Slack Community" src="../../media/light_slack.svg" />
27
+ </picture>
28
+ </a>
29
+ </p>
30
+
31
+ <p align="center">
32
+ <a href="https://trendshift.io/repositories/12122" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12122" alt="browserbase%2Fstagehand | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
33
+ </p>
34
+
35
+ <p align="center">
36
+ <a href="https://deepwiki.com/browserbase/stagehand">
37
+ <img alt="Ask DeepWiki" src="https://deepwiki.com/badge.svg" />
38
+ </a>
39
+ </p>
40
+
41
+ <p align="center">
42
+ If you're looking for the Python implementation, you can find it
43
+ <a href="https://github.com/browserbase/stagehand-python"> here</a>
44
+ </p>
45
+
46
+ <div align="center" style="display: flex; align-items: center; justify-content: center; gap: 4px; margin-bottom: 0;">
47
+ <b>Vibe code</b>
48
+ <span style="font-size: 1.05em;"> Stagehand with </span>
49
+ <a href="https://director.ai" style="display: flex; align-items: center;">
50
+ <span>Director</span>
51
+ </a>
52
+ <span> </span>
53
+ <picture>
54
+ <img alt="Director" src="../../media/director_icon.svg" width="25" />
55
+ </picture>
56
+ </div>
57
+
58
+ ## What is Stagehand?
59
+
60
+ Stagehand is a browser automation framework used to control web browsers with natural language and code. By combining the power of AI with the precision of code, Stagehand makes web automation flexible, maintainable, and actually reliable.
61
+
62
+ ## Why Stagehand?
63
+
64
+ Most existing browser automation tools either require you to write low-level code in a framework like Selenium, Playwright, or Puppeteer, or use high-level agents that can be unpredictable in production. By letting developers choose what to write in code vs. natural language (and bridging the gap between the two) Stagehand is the natural choice for browser automations in production.
65
+
66
+ 1. **Choose when to write code vs. natural language**: use AI when you want to navigate unfamiliar pages, and use code when you know exactly what you want to do.
67
+
68
+ 2. **Go from AI-driven to repeatable workflows**: Stagehand lets you preview AI actions before running them, and also helps you easily cache repeatable actions to save time and tokens.
69
+
70
+ 3. **Write once, run forever**: Stagehand's auto-caching combined with self-healing remembers previous actions, runs without LLM inference, and knows when to involve AI whenever the website changes and your automation breaks.
71
+
72
+ ## Getting Started
73
+
74
+ Start with Stagehand with one line of code, or check out our [Quickstart Guide](https://docs.stagehand.dev/v3/first-steps/quickstart) for more information:
75
+
76
+ ```bash
77
+ npx create-browser-app
78
+ ```
79
+
80
+ ## Example
81
+
82
+ Here's how to build a sample browser automation with Stagehand:
83
+
84
+ ```typescript
85
+ // Stagehand's CDP engine provides an optimized, low level interface to the browser built for automation
86
+ const page = stagehand.context.pages()[0];
87
+ await page.goto("https://github.com/browserbase");
88
+
89
+ // Use act() to execute individual actions
90
+ await stagehand.act("click on the stagehand repo");
91
+
92
+ // Use agent() for multi-step tasks
93
+ const agent = stagehand.agent();
94
+ await agent.execute("Get to the latest PR");
95
+
96
+ // Use extract() to get structured data from the page
97
+ const { author, title } = await stagehand.extract(
98
+ "extract the author and title of the PR",
99
+ z.object({
100
+ author: z.string().describe("The username of the PR author"),
101
+ title: z.string().describe("The title of the PR"),
102
+ }),
103
+ );
104
+ ```
105
+
106
+ ## Documentation
107
+
108
+ Visit [docs.stagehand.dev](https://docs.stagehand.dev) to view the full documentation.
109
+
110
+ ### Build and Run from Source
111
+
112
+ ```bash
113
+ git clone https://github.com/browserbase/stagehand.git
114
+ cd stagehand
115
+ pnpm install
116
+ pnpm run build
117
+ pnpm run example # run the blank script at ./examples/example.ts
118
+ ```
119
+
120
+ Stagehand is best when you have an API key for an LLM provider and Browserbase credentials. To add these to your project, run:
121
+
122
+ ```bash
123
+ cp .env.example .env
124
+ nano .env # Edit the .env file to add API keys
125
+ ```
126
+
127
+ ### Installing from a branch
128
+
129
+ You can install and build Stagehand directly from a github branch using [gitpkg](https://github.com/EqualMa/gitpkg)
130
+
131
+ In your project's `package.json` set:
132
+
133
+ ```json
134
+ "@browserbasehq/stagehand": "https://gitpkg.now.sh/browserbase/stagehand/packages/core?<branchName>",
135
+ ```
136
+
137
+ ## Contributing
138
+
139
+ > [!NOTE]
140
+ > We highly value contributions to Stagehand! For questions or support, please join our [Slack community](https://join.slack.com/t/stagehand-dev/shared_invite/zt-3hgv6bwqu-s7MXXgPd7_rD53aViGo1MQ).
141
+
142
+ At a high level, we're focused on improving reliability, extensibility, speed, and cost in that order of priority. If you're interested in contributing, **bug fixes and small improvements are the best way to get started**. For more involved features, we strongly recommend reaching out to [Miguel Gonzalez](https://x.com/miguel_gonzf) or [Paul Klein](https://x.com/pk_iv) in our [Slack community](https://join.slack.com/t/stagehand-dev/shared_invite/zt-3hgv6bwqu-s7MXXgPd7_rD53aViGo1MQ) before starting to ensure that your contribution aligns with our goals.
143
+
144
+ <!-- For more information, please see our [Contributing Guide](https://docs.stagehand.dev/examples/contributing). -->
145
+
146
+ ## Acknowledgements
147
+
148
+ We'd like to thank the following people for their major contributions to Stagehand:
149
+
150
+ - [Paul Klein](https://github.com/pkiv)
151
+ - [Sean McGuire](https://github.com/seanmcguire12)
152
+ - [Miguel Gonzalez](https://github.com/miguelg719)
153
+ - [Sameel Arif](https://github.com/sameelarif)
154
+ - [Thomas Katwan](https://github.com/tkattkat)
155
+ - [Filip Michalsky](https://github.com/filip-michalsky)
156
+ - [Anirudh Kamath](https://github.com/kamath)
157
+ - [Jeremy Press](https://x.com/jeremypress)
158
+ - [Navid Pour](https://github.com/navidpour)
159
+
160
+ ## License
161
+
162
+ Licensed under the MIT License.
163
+
164
+ Copyright 2025 Browserbase, Inc.
package/dist/index.d.ts CHANGED
@@ -1,4 +1,4 @@
1
- import { ZodType, z, ZodError, ZodTypeAny } from 'zod';
1
+ import z, { ZodType, z as z$1, ZodError, ZodTypeAny } from 'zod/v3';
2
2
  import { ClientOptions as ClientOptions$2 } from '@anthropic-ai/sdk';
3
3
  import { LanguageModelV2 } from '@ai-sdk/provider';
4
4
  import { ClientOptions as ClientOptions$1 } from 'openai';
@@ -13,6 +13,7 @@ export { Page as PatchrightPage } from 'patchright-core';
13
13
  import { Protocol } from 'devtools-protocol';
14
14
  import { Buffer as Buffer$1 } from 'buffer';
15
15
  import Browserbase from '@browserbasehq/sdk';
16
+ import { ChatCompletion } from 'openai/resources';
16
17
  import { ToolSet as ToolSet$1 } from 'ai/dist';
17
18
  import { Schema } from '@google/genai';
18
19
 
@@ -140,14 +141,6 @@ interface CreateChatCompletionOptions {
140
141
  logger: (message: LogLine) => void;
141
142
  retries?: number;
142
143
  }
143
- interface LLMParsedResponse<T> {
144
- data: T;
145
- usage?: {
146
- prompt_tokens: number;
147
- completion_tokens: number;
148
- total_tokens: number;
149
- };
150
- }
151
144
  declare abstract class LLMClient {
152
145
  type: "openai" | "anthropic" | "cerebras" | "groq" | (string & {});
153
146
  modelName: AvailableModel | (string & {});
@@ -155,15 +148,9 @@ declare abstract class LLMClient {
155
148
  clientOptions: ClientOptions;
156
149
  userProvidedInstructions?: string;
157
150
  constructor(modelName: AvailableModel, userProvidedInstructions?: string);
158
- abstract createChatCompletion<T>(options: CreateChatCompletionOptions & {
159
- options: {
160
- response_model: {
161
- name: string;
162
- schema: ZodType;
163
- };
164
- };
165
- }): Promise<LLMParsedResponse<T>>;
166
- abstract createChatCompletion<T = LLMResponse>(options: CreateChatCompletionOptions): Promise<T>;
151
+ abstract createChatCompletion<T = LLMResponse & {
152
+ usage?: LLMResponse["usage"];
153
+ }>(options: CreateChatCompletionOptions): Promise<T>;
167
154
  generateObject: typeof generateObject;
168
155
  generateText: typeof generateText;
169
156
  streamText: typeof streamText;
@@ -280,6 +267,9 @@ declare class Frame implements FrameManager {
280
267
  width: number;
281
268
  height: number;
282
269
  };
270
+ type?: "png" | "jpeg";
271
+ quality?: number;
272
+ scale?: number;
283
273
  }): Promise<Buffer>;
284
274
  /** Child frames via Page.getFrameTree */
285
275
  childFrames(): Promise<Frame[]>;
@@ -593,7 +583,152 @@ declare class LocatorDelegate {
593
583
  first(): LocatorDelegate;
594
584
  }
595
585
 
586
+ type RemoteObject = Protocol.Runtime.RemoteObject;
587
+ type ConsoleListener = (message: ConsoleMessage) => void;
588
+ declare class ConsoleMessage {
589
+ private readonly event;
590
+ private readonly pageRef?;
591
+ constructor(event: Protocol.Runtime.ConsoleAPICalledEvent, pageRef?: Page);
592
+ type(): Protocol.Runtime.ConsoleAPICalledEvent["type"];
593
+ text(): string;
594
+ args(): RemoteObject[];
595
+ location(): {
596
+ url?: string;
597
+ lineNumber?: number;
598
+ columnNumber?: number;
599
+ };
600
+ page(): Page | undefined;
601
+ timestamp(): number | undefined;
602
+ raw(): Protocol.Runtime.ConsoleAPICalledEvent;
603
+ toString(): string;
604
+ }
605
+
606
+ /**
607
+ * Response
608
+ * -----------------
609
+ *
610
+ * This module implements a Playwright-inspired response wrapper that exposes
611
+ * navigation metadata and helpers for retrieving HTTP response bodies. The
612
+ * abstraction is consumed by navigation routines (e.g. `Page.goto`) so callers
613
+ * can synchronously inspect status codes, lazily fetch body text, or await the
614
+ * network layer finishing the request. The implementation is built directly on
615
+ * Chrome DevTools Protocol primitives – it holds the originating `requestId`
616
+ * so it can request payloads via `Network.getResponseBody`, and it listens for
617
+ * `responseReceivedExtraInfo`, `loadingFinished`, and `loadingFailed` events to
618
+ * hydrate the richer header view and resolve callers waiting on completion.
619
+ */
620
+
621
+ type ServerAddr = {
622
+ ipAddress: string;
623
+ port: number;
624
+ };
625
+ /**
626
+ * Thin wrapper around CDP response metadata that mirrors the ergonomics of
627
+ * Playwright's `Response` class. The class intentionally keeps the same method
628
+ * names so upstream integrations can transition with minimal code changes.
629
+ */
630
+ declare class Response$1 {
631
+ private readonly page;
632
+ private readonly session;
633
+ private readonly requestId;
634
+ private readonly frameId?;
635
+ private readonly loaderId?;
636
+ private readonly response;
637
+ private readonly fromServiceWorkerFlag;
638
+ private readonly serverAddress?;
639
+ private headersObject;
640
+ private headersArrayCache;
641
+ private allHeadersCache;
642
+ private readonly headerValuesMap;
643
+ private finishedDeferred;
644
+ private finishedSettled;
645
+ private extraInfoHeaders;
646
+ private extraInfoHeadersText;
647
+ /**
648
+ * Build a response wrapper from the CDP notification associated with a
649
+ * navigation. The constructor captures the owning page/session so follow-up
650
+ * methods (body/text/json) can query CDP on-demand. The `response` payload is
651
+ * the raw `Protocol.Network.Response` object emitted by Chrome.
652
+ */
653
+ constructor(params: {
654
+ page: Page;
655
+ session: CDPSessionLike;
656
+ requestId: string;
657
+ frameId?: string;
658
+ loaderId?: string;
659
+ response: Protocol.Network.Response;
660
+ fromServiceWorker: boolean;
661
+ });
662
+ /** URL associated with the navigation request. */
663
+ url(): string;
664
+ /** HTTP status code reported by Chrome. */
665
+ status(): number;
666
+ /** Human-readable status text that accompanied the response. */
667
+ statusText(): string;
668
+ /** Convenience predicate that checks for 2xx statuses. */
669
+ ok(): boolean;
670
+ /** Returns the Stagehand frame object that initiated the navigation. */
671
+ frame(): Frame | null;
672
+ /** Indicates whether the response was serviced by a Service Worker. */
673
+ fromServiceWorker(): boolean;
674
+ /**
675
+ * Returns TLS security metadata when provided by the browser. In practice
676
+ * this includes certificate issuer, protocol, and validity interval.
677
+ */
678
+ securityDetails(): Promise<Protocol.Network.SecurityDetails | null>;
679
+ /** Returns the resolved server address for the navigation when available. */
680
+ serverAddr(): Promise<ServerAddr | null>;
681
+ /**
682
+ * Returns the response headers normalised to lowercase keys. Matches the
683
+ * behaviour of Playwright's `headers()` by eliding duplicate header entries.
684
+ */
685
+ headers(): Record<string, string>;
686
+ /**
687
+ * Returns all headers including those only surfaced through
688
+ * `responseReceivedExtraInfo` such as `set-cookie`. Values are reported as the
689
+ * browser sends them (no further splitting or concatenation).
690
+ */
691
+ allHeaders(): Promise<Record<string, string>>;
692
+ /** Returns a concatenated header string for the supplied header name. */
693
+ headerValue(name: string): Promise<string | null>;
694
+ /** Returns all values for a header (case-insensitive lookup). */
695
+ headerValues(name: string): Promise<string[]>;
696
+ /**
697
+ * Returns header entries preserving their original wire casing and ordering.
698
+ * Falls back to the CDP object when the raw header text is unavailable.
699
+ */
700
+ headersArray(): Promise<Array<{
701
+ name: string;
702
+ value: string;
703
+ }>>;
704
+ /**
705
+ * Requests the raw response body from Chrome DevTools Protocol. The method is
706
+ * intentionally lazy because not every caller needs the payload, and CDP only
707
+ * allows retrieving it once the response completes.
708
+ */
709
+ body(): Promise<Buffer>;
710
+ /** Decodes the response body as UTF-8 text. */
711
+ text(): Promise<string>;
712
+ /** Parses the response body as JSON and throws if parsing fails. */
713
+ json<T = unknown>(): Promise<T>;
714
+ /**
715
+ * Resolves once the underlying network request completes or fails. Mirrors
716
+ * Playwright's behaviour by resolving to `null` on success and to an `Error`
717
+ * instance when Chrome reports `Network.loadingFailed`.
718
+ */
719
+ finished(): Promise<null | Error>;
720
+ /**
721
+ * Internal helper invoked by the navigation tracker when CDP reports extra
722
+ * header information. This keeps the cached header views in sync with the
723
+ * richer metadata.
724
+ */
725
+ applyExtraInfo(event: Protocol.Network.ResponseReceivedExtraInfoEvent): void;
726
+ /** Marks the response as finished and resolves the `finished()` promise. */
727
+ markFinished(error: Error | null): void;
728
+ }
729
+
596
730
  type AnyPage = Page$1 | Page$2 | Page$3 | Page;
731
+
597
732
  type LoadState = "load" | "domcontentloaded" | "networkidle";
598
733
 
599
734
  declare class StagehandAPIClient {
@@ -606,17 +741,43 @@ declare class StagehandAPIClient {
606
741
  constructor({ apiKey, projectId, logger }: StagehandAPIConstructorParams);
607
742
  init({ modelName, modelApiKey, domSettleTimeoutMs, verbose, systemPrompt, selfHeal, browserbaseSessionCreateParams, browserbaseSessionID, }: StartSessionParams): Promise<StartSessionResult>;
608
743
  act({ input, options, frameId }: APIActParameters): Promise<ActResult>;
609
- extract<T extends z.ZodObject>({ instruction, schema: zodSchema, options, frameId, }: APIExtractParameters): Promise<ExtractResult<T>>;
744
+ extract<T extends z.AnyZodObject>({ instruction, schema: zodSchema, options, frameId, }: APIExtractParameters): Promise<ExtractResult<T>>;
610
745
  observe({ instruction, options, frameId, }: APIObserveParameters): Promise<Action[]>;
611
746
  goto(url: string, options?: {
612
747
  waitUntil?: "load" | "domcontentloaded" | "networkidle";
613
748
  }, frameId?: string): Promise<void>;
614
749
  agentExecute(agentConfig: AgentConfig, executeOptions: AgentExecuteOptions | string, frameId?: string): Promise<AgentResult>;
615
750
  end(): Promise<Response>;
751
+ getReplayMetrics(): Promise<StagehandMetrics>;
616
752
  private execute;
617
753
  private request;
618
754
  }
619
755
 
756
+ type ScreenshotAnimationsOption = "disabled" | "allow";
757
+ type ScreenshotCaretOption = "hide" | "initial";
758
+ type ScreenshotScaleOption = "css" | "device";
759
+ interface ScreenshotClip {
760
+ x: number;
761
+ y: number;
762
+ width: number;
763
+ height: number;
764
+ }
765
+ interface ScreenshotOptions {
766
+ animations?: ScreenshotAnimationsOption;
767
+ caret?: ScreenshotCaretOption;
768
+ clip?: ScreenshotClip;
769
+ fullPage?: boolean;
770
+ mask?: Locator[];
771
+ maskColor?: string;
772
+ omitBackground?: boolean;
773
+ path?: string;
774
+ quality?: number;
775
+ scale?: ScreenshotScaleOption;
776
+ style?: string;
777
+ timeout?: number;
778
+ type?: "png" | "jpeg";
779
+ }
780
+
620
781
  declare class Page {
621
782
  private readonly conn;
622
783
  private readonly mainSession;
@@ -641,6 +802,8 @@ declare class Page {
641
802
  private readonly networkManager;
642
803
  /** Optional API client for routing page operations to the API */
643
804
  private readonly apiClient;
805
+ private readonly consoleListeners;
806
+ private readonly consoleHandlers;
644
807
  private constructor();
645
808
  private cursorEnabled;
646
809
  private ensureCursorScript;
@@ -681,7 +844,30 @@ declare class Page {
681
844
  getSessionById(id: string): CDPSessionLike | undefined;
682
845
  registerSessionForNetwork(session: CDPSessionLike): void;
683
846
  unregisterSessionForNetwork(sessionId: string | undefined): void;
847
+ on(event: "console", listener: ConsoleListener): Page;
848
+ once(event: "console", listener: ConsoleListener): Page;
849
+ off(event: "console", listener: ConsoleListener): Page;
684
850
  targetId(): string;
851
+ /**
852
+ * Send a CDP command through the main session.
853
+ * Allows external consumers to execute arbitrary Chrome DevTools Protocol commands.
854
+ *
855
+ * @param method - The CDP method name (e.g., "Page.enable", "Runtime.evaluate")
856
+ * @param params - Optional parameters for the CDP command
857
+ * @returns Promise resolving to the typed CDP response
858
+ *
859
+ * @example
860
+ * // Enable the Runtime domain
861
+ * await page.sendCDP("Runtime.enable");
862
+ *
863
+ * @example
864
+ * // Evaluate JavaScript with typed response
865
+ * const result = await page.sendCDP<Protocol.Runtime.EvaluateResponse>(
866
+ * "Runtime.evaluate",
867
+ * { expression: "1 + 1" }
868
+ * );
869
+ */
870
+ sendCDP<T = unknown>(method: string, params?: object): Promise<T>;
685
871
  /** Seed the cached URL before navigation events converge. */
686
872
  seedCurrentUrl(url: string | undefined | null): void;
687
873
  mainFrameId(): string;
@@ -696,6 +882,13 @@ declare class Page {
696
882
  /** Public getter for snapshot code / handlers. */
697
883
  getOrdinal(frameId: string): number;
698
884
  listAllFrameIds(): string[];
885
+ private ensureConsoleTaps;
886
+ private installConsoleTap;
887
+ private sessionKey;
888
+ private resolveSessionByKey;
889
+ private teardownConsoleTap;
890
+ private removeAllConsoleTaps;
891
+ private emitConsole;
699
892
  /**
700
893
  * Navigate the page; optionally wait for a lifecycle state.
701
894
  * Waits on the **current** main frame and follows root swaps during navigation.
@@ -703,7 +896,7 @@ declare class Page {
703
896
  goto(url: string, options?: {
704
897
  waitUntil?: LoadState;
705
898
  timeoutMs?: number;
706
- }): Promise<void>;
899
+ }): Promise<Response$1 | null>;
707
900
  /**
708
901
  * Reload the page; optionally wait for a lifecycle state.
709
902
  */
@@ -711,21 +904,21 @@ declare class Page {
711
904
  waitUntil?: LoadState;
712
905
  timeoutMs?: number;
713
906
  ignoreCache?: boolean;
714
- }): Promise<void>;
907
+ }): Promise<Response$1 | null>;
715
908
  /**
716
909
  * Navigate back in history if possible; optionally wait for a lifecycle state.
717
910
  */
718
911
  goBack(options?: {
719
912
  waitUntil?: LoadState;
720
913
  timeoutMs?: number;
721
- }): Promise<void>;
914
+ }): Promise<Response$1 | null>;
722
915
  /**
723
916
  * Navigate forward in history if possible; optionally wait for a lifecycle state.
724
917
  */
725
918
  goForward(options?: {
726
919
  waitUntil?: LoadState;
727
920
  timeoutMs?: number;
728
- }): Promise<void>;
921
+ }): Promise<Response$1 | null>;
729
922
  /**
730
923
  * Return the current page URL (synchronous, cached from navigation events).
731
924
  */
@@ -739,11 +932,36 @@ declare class Page {
739
932
  */
740
933
  title(): Promise<string>;
741
934
  /**
742
- * Capture a screenshot (delegated to the current main frame).
743
- */
744
- screenshot(options?: {
745
- fullPage?: boolean;
746
- }): Promise<Buffer>;
935
+ * Capture a screenshot with Playwright-style options.
936
+ *
937
+ * @param options Optional screenshot configuration.
938
+ * @param options.animations Control CSS/Web animations during capture. Use
939
+ * "disabled" to fast-forward finite animations and pause infinite ones.
940
+ * @param options.caret Either hide the text caret (default) or leave it
941
+ * visible via "initial".
942
+ * @param options.clip Restrict capture to a specific rectangle (in CSS
943
+ * pixels). Cannot be combined with `fullPage`.
944
+ * @param options.fullPage Capture the full scrollable page instead of the
945
+ * current viewport.
946
+ * @param options.mask Array of locators that should be covered with an
947
+ * overlay while the screenshot is taken.
948
+ * @param options.maskColor CSS color used for the mask overlay (default
949
+ * `#FF00FF`).
950
+ * @param options.omitBackground Make the default page background transparent
951
+ * (PNG only).
952
+ * @param options.path File path to write the screenshot to. The file extension
953
+ * determines the image type when `type` is not explicitly provided.
954
+ * @param options.quality JPEG quality (0–100). Only applies when
955
+ * `type === "jpeg"`.
956
+ * @param options.scale Render scale: use "css" for one pixel per CSS pixel,
957
+ * otherwise the default "device" leverages the current device pixel ratio.
958
+ * @param options.style Additional CSS text injected into every frame before
959
+ * capture (removed afterwards).
960
+ * @param options.timeout Maximum capture duration in milliseconds before a
961
+ * timeout error is thrown.
962
+ * @param options.type Image format (`"png"` by default).
963
+ */
964
+ screenshot(options?: ScreenshotOptions): Promise<Buffer>;
747
965
  /**
748
966
  * Create a locator bound to the current main frame.
749
967
  */
@@ -1075,7 +1293,7 @@ interface ActResult {
1075
1293
  actionDescription: string;
1076
1294
  actions: Action[];
1077
1295
  }
1078
- type ExtractResult<T extends z.ZodObject> = z.infer<T>;
1296
+ type ExtractResult<T extends z$1.AnyZodObject> = z$1.infer<T>;
1079
1297
  interface Action {
1080
1298
  selector: string;
1081
1299
  description: string;
@@ -1094,12 +1312,20 @@ interface ExtractOptions {
1094
1312
  selector?: string;
1095
1313
  page?: Page$1 | Page$2 | Page$3 | Page;
1096
1314
  }
1097
- declare const defaultExtractSchema: z.ZodObject<{
1098
- extraction: z.ZodString;
1099
- }, z.core.$strip>;
1100
- declare const pageTextSchema: z.ZodObject<{
1101
- pageText: z.ZodString;
1102
- }, z.core.$strip>;
1315
+ declare const defaultExtractSchema: z$1.ZodObject<{
1316
+ extraction: z$1.ZodString;
1317
+ }, "strip", z$1.ZodTypeAny, {
1318
+ extraction?: string;
1319
+ }, {
1320
+ extraction?: string;
1321
+ }>;
1322
+ declare const pageTextSchema: z$1.ZodObject<{
1323
+ pageText: z$1.ZodString;
1324
+ }, "strip", z$1.ZodTypeAny, {
1325
+ pageText?: string;
1326
+ }, {
1327
+ pageText?: string;
1328
+ }>;
1103
1329
  interface ObserveOptions {
1104
1330
  model?: ModelConfiguration;
1105
1331
  timeout?: number;
@@ -1297,6 +1523,15 @@ declare class StagehandShadowSegmentNotFoundError extends StagehandError {
1297
1523
  constructor(segment: string, hint?: string);
1298
1524
  }
1299
1525
 
1526
+ declare class AISdkClient extends LLMClient {
1527
+ type: "aisdk";
1528
+ private model;
1529
+ constructor({ model }: {
1530
+ model: LanguageModelV2;
1531
+ });
1532
+ createChatCompletion<T = ChatCompletion>({ options, }: CreateChatCompletionOptions): Promise<T>;
1533
+ }
1534
+
1300
1535
  interface StagehandAPIConstructorParams {
1301
1536
  apiKey: string;
1302
1537
  projectId: string;
@@ -1591,6 +1826,7 @@ declare class V3 {
1591
1826
  private readonly domSettleTimeoutMs?;
1592
1827
  private _isClosing;
1593
1828
  browserbaseSessionId?: string;
1829
+ get browserbaseSessionID(): string | undefined;
1594
1830
  private _onCdpClosed;
1595
1831
  readonly experimental: boolean;
1596
1832
  readonly logInferenceToFile: boolean;
@@ -1610,7 +1846,7 @@ declare class V3 {
1610
1846
  constructor(opts: V3Options);
1611
1847
  /**
1612
1848
  * Async property for metrics so callers can `await v3.metrics`.
1613
- * Returning a Promise future-proofs async aggregation/storage.
1849
+ * When using API mode, fetches metrics from the API. Otherwise returns local metrics.
1614
1850
  */
1615
1851
  get metrics(): Promise<StagehandMetrics>;
1616
1852
  private resolveLlmClient;
@@ -1657,10 +1893,10 @@ declare class V3 {
1657
1893
  * - extract(instruction, schema) → schema-inferred
1658
1894
  * - extract(instruction, schema, options)
1659
1895
  */
1660
- extract(): Promise<z.infer<typeof pageTextSchema>>;
1661
- extract(options: ExtractOptions): Promise<z.infer<typeof pageTextSchema>>;
1662
- extract(instruction: string, options?: ExtractOptions): Promise<z.infer<typeof defaultExtractSchema>>;
1663
- extract<T extends ZodTypeAny>(instruction: string, schema: T, options?: ExtractOptions): Promise<z.infer<T>>;
1896
+ extract(): Promise<z$1.infer<typeof pageTextSchema>>;
1897
+ extract(options: ExtractOptions): Promise<z$1.infer<typeof pageTextSchema>>;
1898
+ extract(instruction: string, options?: ExtractOptions): Promise<z$1.infer<typeof defaultExtractSchema>>;
1899
+ extract<T extends ZodTypeAny>(instruction: string, schema: T, options?: ExtractOptions): Promise<z$1.infer<T>>;
1664
1900
  /**
1665
1901
  * Run an "observe" instruction through the ObserveHandler.
1666
1902
  */
@@ -1732,14 +1968,14 @@ declare class AgentProvider {
1732
1968
  static getAgentProvider(modelName: string): AgentProviderType;
1733
1969
  }
1734
1970
 
1735
- declare function validateZodSchema(schema: z.ZodTypeAny, data: unknown): boolean;
1971
+ declare function validateZodSchema(schema: z$1.ZodTypeAny, data: unknown): boolean;
1736
1972
  /**
1737
1973
  * Detects if the code is running in the Bun runtime environment.
1738
1974
  * @returns {boolean} True if running in Bun, false otherwise.
1739
1975
  */
1740
1976
  declare function isRunningInBun(): boolean;
1741
- declare function toGeminiSchema(zodSchema: z.ZodTypeAny): Schema;
1742
- declare function getZodType(schema: z.ZodTypeAny): string;
1977
+ declare function toGeminiSchema(zodSchema: z$1.ZodTypeAny): Schema;
1978
+ declare function getZodType(schema: z$1.ZodTypeAny): string;
1743
1979
  /**
1744
1980
  * Recursively traverses a given Zod schema, scanning for any fields of type `z.string().url()`.
1745
1981
  * For each such field, it replaces the `z.string().url()` with `z.number()`.
@@ -1753,7 +1989,7 @@ declare function getZodType(schema: z.ZodTypeAny): string;
1753
1989
  * 1. The updated Zod schema, with any `.url()` fields replaced by `z.number()`.
1754
1990
  * 2. An array of {@link ZodPathSegments} objects representing each replaced field, including the path segments.
1755
1991
  */
1756
- declare function transformSchema(schema: z.ZodTypeAny, currentPath: Array<string | number>): [z.ZodTypeAny, ZodPathSegments[]];
1992
+ declare function transformSchema(schema: z$1.ZodTypeAny, currentPath: Array<string | number>): [z$1.ZodTypeAny, ZodPathSegments[]];
1757
1993
  /**
1758
1994
  * Once we get the final extracted object that has numeric IDs in place of URLs,
1759
1995
  * use `injectUrls` to walk the object and replace numeric IDs
@@ -1822,4 +2058,4 @@ declare class V3Evaluator {
1822
2058
  private _evaluateWithMultipleScreenshots;
1823
2059
  }
1824
2060
 
1825
- export { type AISDKCustomProvider, type AISDKProvider, AVAILABLE_CUA_MODELS, type ActOptions, type ActResult, type Action, type ActionExecutionResult, type AgentAction, type AgentConfig, type AgentExecuteOptions, type AgentExecutionOptions, type AgentHandlerOptions, type AgentInstance, type AgentModelConfig, AgentProvider, type AgentProviderType, type AgentResult, AgentScreenshotProviderError, type AgentType, AnnotatedScreenshotText, type AnthropicContentBlock, type AnthropicJsonSchemaObject, type AnthropicMessage, type AnthropicTextBlock, type AnthropicToolResult, type AnyPage, type AvailableCuaModel, type AvailableModel, BrowserbaseSessionNotFoundError, CaptchaTimeoutError, type ChatCompletionOptions, type ChatMessage, type ChatMessageContent, type ChatMessageImageContent, type ChatMessageTextContent, type ClientOptions, type ComputerCallItem, ContentFrameNotFoundError, type CreateChatCompletionOptions, CreateChatCompletionResponseError, ExperimentalApiConflictError, ExperimentalNotConfiguredError, type ExtractOptions, type ExtractResult, type FunctionCallItem, HandlerNotInitializedError, type HistoryEntry, InvalidAISDKModelFormatError, type JsonSchema, type JsonSchemaProperty, LLMClient, type LLMParsedResponse, type LLMResponse, LLMResponseError, type LLMTool, LOG_LEVEL_NAMES, type LoadState, type LocalBrowserLaunchOptions, type LogLevel, type LogLine, type Logger, MCPConnectionError, MissingEnvironmentVariableError, MissingLLMConfigurationError, type ModelConfiguration, type ModelProvider, type ObserveOptions, type ResponseInputItem, type ResponseItem, V3 as Stagehand, StagehandAPIError, StagehandAPIUnauthorizedError, StagehandClickError, StagehandDefaultError, StagehandDomProcessError, StagehandElementNotFoundError, StagehandEnvironmentError, StagehandError, StagehandEvalError, StagehandHttpError, StagehandIframeError, StagehandInitError, StagehandInvalidArgumentError, type StagehandMetrics, StagehandMissingArgumentError, StagehandNotInitializedError, StagehandResponseBodyError, StagehandResponseParseError, StagehandServerError, StagehandShadowRootMissingError, StagehandShadowSegmentEmptyError, StagehandShadowSegmentNotFoundError, type ToolUseItem, UnsupportedAISDKModelProviderError, UnsupportedModelError, UnsupportedModelProviderError, V3, type V3Env, V3Evaluator, V3FunctionName, type V3Options, XPathResolutionError, ZodSchemaValidationError, connectToMCPServer, defaultExtractSchema, getZodType, injectUrls, isRunningInBun, jsonSchemaToZod, loadApiKeyFromEnv, modelToAgentProviderMap, pageTextSchema, providerEnvVarMap, toGeminiSchema, transformSchema, trimTrailingTextNode, validateZodSchema };
2061
+ export { type AISDKCustomProvider, type AISDKProvider, AISdkClient, AVAILABLE_CUA_MODELS, type ActOptions, type ActResult, type Action, type ActionExecutionResult, type AgentAction, type AgentConfig, type AgentExecuteOptions, type AgentExecutionOptions, type AgentHandlerOptions, type AgentInstance, type AgentModelConfig, AgentProvider, type AgentProviderType, type AgentResult, AgentScreenshotProviderError, type AgentType, AnnotatedScreenshotText, type AnthropicContentBlock, type AnthropicJsonSchemaObject, type AnthropicMessage, type AnthropicTextBlock, type AnthropicToolResult, type AnyPage, type AvailableCuaModel, type AvailableModel, BrowserbaseSessionNotFoundError, CaptchaTimeoutError, type ChatCompletionOptions, type ChatMessage, type ChatMessageContent, type ChatMessageImageContent, type ChatMessageTextContent, type ClientOptions, type ComputerCallItem, type ConsoleListener, ConsoleMessage, ContentFrameNotFoundError, type CreateChatCompletionOptions, CreateChatCompletionResponseError, ExperimentalApiConflictError, ExperimentalNotConfiguredError, type ExtractOptions, type ExtractResult, type FunctionCallItem, HandlerNotInitializedError, type HistoryEntry, InvalidAISDKModelFormatError, type JsonSchema, type JsonSchemaProperty, LLMClient, type LLMResponse, LLMResponseError, type LLMTool, LOG_LEVEL_NAMES, type LoadState, type LocalBrowserLaunchOptions, type LogLevel, type LogLine, type Logger, MCPConnectionError, MissingEnvironmentVariableError, MissingLLMConfigurationError, type ModelConfiguration, type ModelProvider, type ObserveOptions, Page, Response$1 as Response, type ResponseInputItem, type ResponseItem, V3 as Stagehand, StagehandAPIError, StagehandAPIUnauthorizedError, StagehandClickError, StagehandDefaultError, StagehandDomProcessError, StagehandElementNotFoundError, StagehandEnvironmentError, StagehandError, StagehandEvalError, StagehandHttpError, StagehandIframeError, StagehandInitError, StagehandInvalidArgumentError, type StagehandMetrics, StagehandMissingArgumentError, StagehandNotInitializedError, StagehandResponseBodyError, StagehandResponseParseError, StagehandServerError, StagehandShadowRootMissingError, StagehandShadowSegmentEmptyError, StagehandShadowSegmentNotFoundError, type ToolUseItem, UnsupportedAISDKModelProviderError, UnsupportedModelError, UnsupportedModelProviderError, V3, type V3Env, V3Evaluator, V3FunctionName, type V3Options, XPathResolutionError, ZodSchemaValidationError, connectToMCPServer, defaultExtractSchema, getZodType, injectUrls, isRunningInBun, jsonSchemaToZod, loadApiKeyFromEnv, modelToAgentProviderMap, pageTextSchema, providerEnvVarMap, toGeminiSchema, transformSchema, trimTrailingTextNode, validateZodSchema };