notte-agent 0.0.dev0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,107 @@
1
+ You are a precise browser automation agent that interacts with websites through structured commands.
2
+ Your role is to:
3
+ 1. Analyze the provided webpage elements and structure
4
+ 2. Plan a sequence of actions to accomplish the given task
5
+ 3. Respond with valid JSON containing your action sequence and state assessment
6
+
7
+ Current date and time: {{timstamp}}
8
+
9
+ INPUT STRUCTURE:
10
+ 1. Current URL: The webpage you're currently on
11
+ 2. Available Tabs: List of open browser tabs
12
+ 3. Interactive Elements: List in the format:
13
+ id[:]<element_type>element_text</element_type>
14
+ - `id`: identifier for interaction. `ids` can be decomposed into `<role_first_letter><index>[:]` where `<index>` is the index of the element in the list of elements with the same role and `<role_first_letter>` are:
15
+ - `I` for input fields (textbox, select, checkbox, etc.)
16
+ - `B` for buttons
17
+ - `L` for links
18
+ - `F` for figures and images
19
+ - `O` for options in select elements
20
+ - `M` for miscallaneous elements (e.g. modals, dialogs, etc.) that are only clickable for the most part.
21
+ - `element_type`: HTML element type (button, input, etc.)
22
+ - `element_text`: Visible text or element description
23
+
24
+ Example:
25
+ B1[:]<button>Submit Form</button>
26
+ _[:] Non-interactive text
27
+
28
+
29
+ Notes:
30
+ - Only elements with `ids` are interactive
31
+ - `_[:]` elements provide context but cannot be interacted with
32
+
33
+ 1. RESPONSE FORMAT: You must ALWAYS respond with valid JSON in this exact format:
34
+ ```json
35
+ {{& example_step}}
36
+ ```
37
+
38
+
39
+ 2. ACTIONS: You are only allowed to chose one single action from the list to be executed.
40
+
41
+ You will find below some common actions sequences so that you can undertand the flow of some tasks.
42
+ IDS presented in those sequences correspond to interactionable elements found in the page.
43
+ You might encounter the same ids, but never assume them to exist, or have the same role.
44
+
45
+ Common action sequences:
46
+ - Form filling:
47
+ ```json
48
+ {{& example_form_filling}}
49
+ ```
50
+ - Navigation and extraction:
51
+ ```json
52
+ {{& example_navigation_and_extraction}}
53
+ ```
54
+
55
+ REMEMBER: You are NEVER allowed to specify multiple actions in the list of actions.
56
+
57
+
58
+ 3. ELEMENT INTERACTION:
59
+ - Only use `ids` that exist in the provided element list
60
+ - Each element has a unique `id` (e.g., `I2[:]<button>`)
61
+ - Elements marked with `_[:]` are non-interactive (for context only)
62
+
63
+ 4. NAVIGATION & ERROR HANDLING:
64
+ - If no suitable elements exist, use other functions to complete the task
65
+ - If stuck, try alternative approaches
66
+ - Handle popups/cookies by accepting or closing them
67
+ - Use scroll to find elements you are looking for
68
+
69
+ 5. TASK COMPLETION:
70
+ - Use the `{{completion_action_name}}` action as the last action as soon as the task is complete
71
+ - Don't hallucinate actions
72
+ - If the task requires specific information - make sure to include everything in the `{{completion_action_name}}` function. This is what the user will see.
73
+ - If you are running out of steps (current step), think about speeding it up, and ALWAYS use the `{{completion_action_name}}` action as the last action.
74
+ - Note that the `{{completion_action_name}}` can fail because an external validator failed to validate the output. If this happens, you should reflect on why the output is invalid and try to fix it.
75
+
76
+ - Example of sucessfuly `{{completion_action_name}}` action:
77
+ ```json
78
+ {{& completion_example}}
79
+ ```
80
+
81
+ 6. VISUAL CONTEXT:
82
+ - When an image is provided, use it to understand the page layout
83
+ - Bounding boxes with labels correspond to element indexes
84
+ - Each bounding box and its label have the same color
85
+ - Most often the label is inside the bounding box, on the top right
86
+ - Visual context helps verify element locations and relationships
87
+ - sometimes labels overlap, so use the context to verify the correct element
88
+
89
+ 7. Form filling:
90
+ - If you fill an input field and your action sequence is interrupted, most often a list with suggestions popped up under the field and you need to first select the right element from the suggestion list.
91
+
92
+ 8. ACTION SEQUENCING:
93
+ - Actions are executed in the order they appear in the list
94
+ - Each action should logically follow from the previous one
95
+ - If the page changes after an action, the sequence is interrupted and you get the new state.
96
+ - If content only disappears the sequence continues.
97
+ - Only provide the action sequence until you think the page will change.
98
+ - Try to be efficient, e.g. fill forms at once, or chain actions where nothing changes on the page like saving, extracting, checkboxes...
99
+ - NEVER use multiple actions in a single step (otherwise ONLY the first action will be executed)
100
+
101
+ 9. Long tasks:
102
+ - If the task is long keep track of the status in the memory. If the ultimate task requires multiple subinformation, keep track of the status in the memory
103
+
104
+ Functions:
105
+ {{& action_description}}
106
+
107
+ Remember: Your responses must be valid JSON matching the specified format. Each action in the sequence must be valid.
@@ -0,0 +1,42 @@
1
+ from pydantic import Field
2
+ from typing_extensions import override
3
+
4
+ from notte_agent.common.trajectory_history import (
5
+ TrajectoryHistory,
6
+ TrajectoryStep,
7
+ )
8
+ from notte_agent.falco.types import StepAgentOutput
9
+
10
+
11
+ class FalcoTrajectoryHistory(TrajectoryHistory[StepAgentOutput]):
12
+ steps: list[TrajectoryStep[StepAgentOutput]] = Field(default_factory=list)
13
+ max_error_length: int | None = None
14
+
15
+ @override
16
+ def perceive_step(
17
+ self,
18
+ step: TrajectoryStep[StepAgentOutput],
19
+ step_idx: int = 0,
20
+ include_ids: bool = False,
21
+ include_data: bool = True,
22
+ ) -> str:
23
+ action_msg = "\n".join([" - " + result.input.dump_str() for result in step.results])
24
+ status_msg = "\n".join(
25
+ [" - " + self.perceive_step_result(result, include_ids, include_data) for result in step.results]
26
+ )
27
+ return f"""
28
+ # Execution step {step_idx}
29
+ * state:
30
+ - page_summary: {step.agent_response.state.page_summary}
31
+ - previous_goal_status: {step.agent_response.state.previous_goal_status}
32
+ - previous_goal_eval: {step.agent_response.state.previous_goal_eval}
33
+ - memory: {step.agent_response.state.memory}
34
+ - next_goal: {step.agent_response.state.next_goal}
35
+ * selected actions:
36
+ {action_msg}
37
+ * execution results:
38
+ {status_msg}"""
39
+
40
+ @override
41
+ def add_output(self, output: StepAgentOutput) -> None:
42
+ self.steps.append(TrajectoryStep(agent_response=output, results=[]))
@@ -0,0 +1,132 @@
1
+ from typing import Any, Literal, TypeVar
2
+
3
+ from loguru import logger
4
+ from notte_core.controller.actions import BaseAction, ClickAction, CompletionAction
5
+ from notte_core.controller.space import ActionSpace
6
+ from pydantic import BaseModel, Field, create_model, field_serializer
7
+
8
+
9
+ class RelevantInteraction(BaseModel):
10
+ """Interaction ids that can be relevant to the next actions"""
11
+
12
+ id: str
13
+ reason: str
14
+
15
+
16
+ class AgentState(BaseModel):
17
+ """Current state of the agent"""
18
+
19
+ previous_goal_status: Literal["success", "failure", "unknown"]
20
+ previous_goal_eval: str
21
+ page_summary: str
22
+ relevant_interactions: list[RelevantInteraction]
23
+ memory: str
24
+ next_goal: str
25
+
26
+
27
+ # TODO: for later when we do a refactoring
28
+ class BetterAgentAction(BaseModel):
29
+ """Base class for agent actions with explicit action handling"""
30
+
31
+ action_name: str
32
+ parameters: dict[str, str | int | bool | None]
33
+
34
+ @classmethod
35
+ def from_action(cls, action: BaseAction) -> "BetterAgentAction":
36
+ return cls(action_name=action.name(), parameters=action.model_dump(exclude={"category", "id"}))
37
+
38
+ def to_action(self, space: ActionSpace) -> BaseAction:
39
+ action_cls = space.action_map.get(self.action_name)
40
+ if not action_cls:
41
+ raise ValueError(f"Unknown action type: {self.action_name}")
42
+ return action_cls(**self.parameters) # type: ignore[arg-type]
43
+
44
+
45
+ class AgentAction(BaseModel):
46
+ def to_action(self) -> BaseAction:
47
+ field_sets = self.model_fields_set
48
+ if len(field_sets) != 1:
49
+ raise ValueError(f"Multiple actions found in {self.model_dump_json()}")
50
+ action_name = list(field_sets)[0]
51
+ return getattr(self, action_name)
52
+
53
+
54
+ def create_agent_action_model() -> type[AgentAction]:
55
+ """Creates a Pydantic model from registered actions"""
56
+ space = ActionSpace(description="does not matter")
57
+ fields = {
58
+ name: (
59
+ ActionModel | None,
60
+ Field(default=None, description=ActionModel.model_json_schema()["properties"]["description"]["default"]),
61
+ )
62
+ for name, ActionModel in space.action_map.items()
63
+ }
64
+ return create_model(AgentAction.__name__, __base__=AgentAction, **fields) # type: ignore[call-overload]
65
+
66
+
67
+ TAgentAction = TypeVar("TAgentAction", bound=AgentAction)
68
+
69
+ _AgentAction: type[AgentAction] = create_agent_action_model()
70
+
71
+
72
+ class StepAgentOutput(BaseModel):
73
+ state: AgentState
74
+ actions: list[_AgentAction] = Field(min_length=1) # type: ignore[type-arg]
75
+
76
+ @field_serializer("actions")
77
+ def serialize_actions(self, actions: list[AgentAction], _info: Any) -> list[dict[str, Any]]:
78
+ return [action.to_action().dump_dict() for action in actions]
79
+
80
+ @property
81
+ def output(self) -> CompletionAction | None:
82
+ last_action: CompletionAction | None = getattr(self.actions[-1], CompletionAction.name()) # type: ignore[attr-defined]
83
+ if last_action is not None:
84
+ return CompletionAction(success=last_action.success, answer=last_action.answer)
85
+ return None
86
+
87
+ def get_actions(self, max_actions: int | None = None) -> list[BaseAction]:
88
+ actions: list[BaseAction] = []
89
+ # compute valid list of actions
90
+ raw_actions: list[AgentAction] = self.actions # type: ignore[type-assignment]
91
+ for i, _action in enumerate(raw_actions):
92
+ is_last = i == len(raw_actions) - 1
93
+ actions.append(_action.to_action())
94
+ if not is_last and max_actions is not None and i >= max_actions:
95
+ logger.warning(f"Max actions reached: {max_actions}. Skipping remaining actions.")
96
+ break
97
+ if not is_last and actions[-1].name() == ClickAction.name() and actions[-1].id.startswith("L"):
98
+ logger.warning(f"Removing all actions after link click: {actions[-1].id}")
99
+ # all actions after a link `L` should be removed from the list
100
+ break
101
+ return actions
102
+
103
+ def pretty_string(self, colors: bool = True) -> str:
104
+ status = self.state.previous_goal_status
105
+ status_emoji: str
106
+ match status:
107
+ case "unknown":
108
+ status_emoji = "❓"
109
+ case "success":
110
+ status_emoji = "✅"
111
+ case "failure":
112
+ status_emoji = "❌"
113
+
114
+ def surround_tags(s: str, tags: tuple[str, ...] = ("b", "blue")) -> str:
115
+ if not colors:
116
+ return s
117
+
118
+ start = "".join(f"<{tag}>" for tag in tags)
119
+ end = "".join(f"</{tag}>" for tag in reversed(tags))
120
+ return f"{start}{s}{end}"
121
+
122
+ action_str = ""
123
+ actions: list[AgentAction] = self.actions # type: ignore[reportUnkownMemberType]
124
+ for action in actions:
125
+ action_base: BaseAction = action.to_action()
126
+ action_str += f" ▶ {action_base.name()} with id {action_base.id}"
127
+ return f"""📝 {surround_tags("Current page:")} {self.state.page_summary}
128
+ 🔬 {surround_tags("Previous goal:")} {status_emoji} {self.state.previous_goal_eval}
129
+ 🧠 {surround_tags("Memory:")} {self.state.memory}
130
+ 🎯 {surround_tags("Next goal:")} {self.state.next_goal}
131
+ ⚡ {surround_tags("Taking action:")}
132
+ {action_str}"""
File without changes
@@ -0,0 +1,180 @@
1
+ from collections.abc import Callable
2
+
3
+ from loguru import logger
4
+ from notte_browser.dom.locate import locate_element
5
+ from notte_browser.resolution import NodeResolutionPipe
6
+ from notte_browser.session import NotteSession, NotteSessionConfig
7
+ from notte_browser.vault import VaultScreetsScreenshotMask
8
+ from notte_browser.window import BrowserWindow
9
+ from notte_core.browser.observation import Observation
10
+ from notte_core.common.tracer import LlmUsageDictTracer
11
+ from notte_core.controller.actions import CompletionAction, InteractionAction
12
+ from notte_core.credentials.base import BaseVault
13
+ from notte_core.llms.engine import LLMEngine
14
+ from patchright.async_api import Locator
15
+ from typing_extensions import override
16
+
17
+ from notte_agent.common.base import BaseAgent
18
+ from notte_agent.common.config import AgentConfig
19
+ from notte_agent.common.conversation import Conversation
20
+ from notte_agent.common.parser import NotteStepAgentOutput
21
+ from notte_agent.common.types import AgentResponse
22
+ from notte_agent.falco.agent import FalcoAgent
23
+ from notte_agent.gufo.parser import GufoParser
24
+ from notte_agent.gufo.perception import GufoPerception
25
+ from notte_agent.gufo.prompt import GufoPrompt
26
+
27
+
28
+ class GufoAgentConfig(AgentConfig):
29
+ @classmethod
30
+ @override
31
+ def default_session(cls) -> NotteSessionConfig:
32
+ return NotteSessionConfig().use_llm()
33
+
34
+
35
+ class GufoAgent(BaseAgent):
36
+ """
37
+ A base agent implementation that coordinates between an LLM and the Notte environment.
38
+
39
+ This class demonstrates how to build an agent that can:
40
+ 1. Maintain a conversation with an LLM
41
+ 2. Execute actions in the Notte environment
42
+ 3. Parse and format responses between the LLM and Notte
43
+
44
+ To customize this agent:
45
+ 1. Implement your own Parser class to format observations and actions
46
+ 2. Modify the conversation flow in the run() method
47
+ 3. Adjust the think() method to handle LLM interactions
48
+ 4. Customize the ask_notte() method for your specific needs
49
+
50
+ Args:
51
+ task (str): The task description for the agent
52
+ model (str): The LLM model identifier
53
+ max_steps (int): Maximum number of steps before terminating
54
+ headless (bool): Whether to run browser in headless mode
55
+ parser (Parser | None): Custom parser for formatting interactions
56
+ """
57
+
58
+ def __init__(
59
+ self,
60
+ config: AgentConfig,
61
+ window: BrowserWindow | None = None,
62
+ vault: BaseVault | None = None,
63
+ step_callback: Callable[[str, NotteStepAgentOutput], None] | None = None,
64
+ ) -> None:
65
+ super().__init__(session=NotteSession(config=config.session, window=window))
66
+ self.step_callback: Callable[[str, NotteStepAgentOutput], None] | None = step_callback
67
+ self.tracer: LlmUsageDictTracer = LlmUsageDictTracer()
68
+ self.config: AgentConfig = config
69
+ self.vault: BaseVault | None = vault
70
+ self.llm: LLMEngine = LLMEngine(
71
+ model=config.reasoning_model,
72
+ tracer=self.tracer,
73
+ structured_output_retries=config.session.structured_output_retries,
74
+ verbose=self.config.verbose,
75
+ )
76
+ # Users should implement their own parser to customize how observations
77
+ # and actions are formatted for their specific LLM and use case
78
+ self.parser: GufoParser = GufoParser()
79
+ self.prompt: GufoPrompt = GufoPrompt(self.parser)
80
+ self.perception: GufoPerception = GufoPerception()
81
+ self.conv: Conversation = Conversation()
82
+
83
+ if self.vault is not None:
84
+ # hide vault leaked credentials within llm inputs
85
+ self.llm.structured_completion = self.vault.patch_structured_completion(0, self.vault.get_replacement_map)(
86
+ self.llm.structured_completion
87
+ )
88
+
89
+ async def reset(self):
90
+ await self.session.reset()
91
+ self.conv.reset()
92
+
93
+ def output(self, answer: str, success: bool) -> AgentResponse:
94
+ return AgentResponse(
95
+ answer=answer,
96
+ success=success,
97
+ session_trajectory=self.session.trajectory,
98
+ agent_trajectory=[],
99
+ llm_usage=self.tracer.usage,
100
+ )
101
+
102
+ async def step(self, task: str) -> CompletionAction | None:
103
+ # Processes the conversation history through the LLM to decide the next action.
104
+ # logger.info(f"🤖 LLM prompt:\n{self.conv.messages()}")
105
+ response: str = self.llm.single_completion(self.conv.messages())
106
+ self.conv.add_assistant_message(content=response)
107
+ logger.info(f"🤖 LLM response:\n{response}")
108
+ # Ask Notte to perform the selected action
109
+ parsed_response = self.parser.parse(response)
110
+
111
+ if parsed_response is None or parsed_response.action is None:
112
+ self.conv.add_user_message(content=self.prompt.env_rules())
113
+ return None
114
+
115
+ if self.step_callback is not None:
116
+ self.step_callback(task, parsed_response)
117
+
118
+ if parsed_response.completion is not None:
119
+ return parsed_response.completion
120
+ action = parsed_response.action
121
+ # Replace credentials if needed using the vault
122
+ if self.vault is not None and self.vault.contains_credentials(action):
123
+ action_with_selector = await NodeResolutionPipe.forward(action, self.session.snapshot)
124
+
125
+ if isinstance(action_with_selector, InteractionAction) and action_with_selector.selector is not None:
126
+ locator: Locator = await locate_element(self.session.window.page, action_with_selector.selector)
127
+ attrs = await FalcoAgent.compute_locator_attributes(locator)
128
+
129
+ assert isinstance(action_with_selector, InteractionAction) and action_with_selector.selector is not None
130
+
131
+ action = self.vault.replace_credentials(
132
+ action,
133
+ attrs,
134
+ self.session.snapshot,
135
+ )
136
+ # Execute the action
137
+ obs: Observation = await self.session.act(action)
138
+ text_obs = self.perception.perceive(obs)
139
+ self.conv.add_user_message(
140
+ content=f"""
141
+ {text_obs}
142
+ {self.prompt.select_action_rules()}
143
+ {self.prompt.completion_rules()}
144
+ """,
145
+ image=obs.screenshot if self.config.include_screenshot else None,
146
+ )
147
+ logger.info(f"🌌 Action successfully executed:\n{text_obs}")
148
+ return None
149
+
150
+ @override
151
+ async def run(self, task: str, url: str | None = None) -> AgentResponse:
152
+ """
153
+ Main execution loop that coordinates between the LLM and Notte environment.
154
+
155
+ This method shows a basic conversation flow. Consider customizing:
156
+ 1. The initial system prompt
157
+ 2. How observations are added to the conversation
158
+ 3. When and how to determine task completion
159
+ 4. Error handling and recovery strategies
160
+ """
161
+ logger.info(f"🚀 starting agent with task: {task} and url: {url}")
162
+ system_msg = self.prompt.system(task, url)
163
+ if self.vault is not None:
164
+ system_msg += "\n" + self.vault.instructions()
165
+ self.conv.add_system_message(content=system_msg)
166
+ self.conv.add_user_message(self.prompt.env_rules())
167
+ async with self.session:
168
+ if self.vault is not None:
169
+ self.session.window.screenshot_mask = VaultScreetsScreenshotMask(vault=self.vault)
170
+ for i in range(self.config.session.max_steps):
171
+ logger.info(f"> step {i}: looping in")
172
+ output = await self.step(task=task)
173
+ if output is not None:
174
+ status = "😎 task completed sucessfully" if output.success else "👿 task failed"
175
+ logger.info(f"{status} with answer: {output.answer}")
176
+ return self.output(output.answer, output.success)
177
+ # If the task is not done, raise an error
178
+ error_msg = f"Failed to solve task in {self.config.session.max_steps} steps"
179
+ logger.info(f"🚨 {error_msg}")
180
+ return self.output(error_msg, False)
@@ -0,0 +1,79 @@
1
+ from typing import ClassVar, Literal
2
+
3
+ from notte_core.actions.base import ActionParameterValue, ExecutableAction
4
+ from notte_core.controller.actions import CompletionAction, GotoAction, ScrapeAction
5
+ from typing_extensions import override
6
+
7
+ from notte_agent.common.parser import BaseParser, NotteStepAgentOutput
8
+
9
+
10
+ class GufoParser(BaseParser):
11
+ observe_tag: ClassVar[str] = "observe"
12
+ step_tag: ClassVar[str] = "execute-action"
13
+ scrape_tag: ClassVar[str] = "scrape-data"
14
+ done_tag: ClassVar[str] = "done"
15
+
16
+ @override
17
+ def example_format(self, endpoint: Literal["observe", "step", "scrape", "done", "error"]) -> str | None:
18
+ match endpoint:
19
+ case "observe":
20
+ return f"""
21
+ <{self.observe_tag}>
22
+ {GotoAction(url="https://www.example.com").dump_str(name=False)}
23
+ </{self.observe_tag}>
24
+ """
25
+ case "step":
26
+ return f"""
27
+ <{self.step_tag}>
28
+ {
29
+ ExecutableAction(
30
+ id="<YOUR_ACTION_ID>",
31
+ params_values=[ActionParameterValue(name="<YOUR_PARAM_NAME>", value="<YOUR_PARAM_VALUE>")],
32
+ ).dump_str(name=False)
33
+ }
34
+ </{self.step_tag}>
35
+ """
36
+ case "scrape":
37
+ return f"""
38
+ <{self.scrape_tag}>
39
+ {ScrapeAction(instructions="<YOUR_SCRAPING_INSTRUCTIONS | null to scrape the whole page>").dump_str(name=False)}
40
+ </{self.scrape_tag}>
41
+ """
42
+ case "done":
43
+ return f"""
44
+ <{self.done_tag}>
45
+ {CompletionAction(success=True, answer="<YOUR_ANSWER>").dump_str(name=False)}
46
+ </{self.done_tag}>
47
+ """
48
+ case "error":
49
+ return f"""
50
+ <{self.done_tag}>
51
+ {CompletionAction(success=False, answer="<REASON_FOR_FAILURE>").dump_str(name=False)}
52
+ </{self.done_tag}>
53
+ """
54
+
55
+ @override
56
+ def parse(self, text: str) -> NotteStepAgentOutput | None:
57
+ url = self.search_pattern(text, GufoParser.observe_tag)
58
+ action = self.search_pattern(text, GufoParser.step_tag)
59
+ scrape = self.search_pattern(text, GufoParser.scrape_tag)
60
+ output = self.search_pattern(text, GufoParser.done_tag)
61
+ match (bool(url), bool(action), bool(scrape), bool(output)):
62
+ case (True, False, False, False):
63
+ return NotteStepAgentOutput(
64
+ observe=GotoAction.model_validate(self.parse_json(text, GufoParser.observe_tag))
65
+ )
66
+ case (False, True, False, False):
67
+ return NotteStepAgentOutput(
68
+ step=ExecutableAction.model_validate(self.parse_json(text, GufoParser.step_tag)),
69
+ )
70
+ case (False, False, True, False):
71
+ return NotteStepAgentOutput(
72
+ scrape=ScrapeAction.model_validate(self.parse_json(text, GufoParser.scrape_tag))
73
+ )
74
+ case (False, False, False, True):
75
+ return NotteStepAgentOutput(
76
+ completion=CompletionAction.model_validate(self.parse_json(text, GufoParser.done_tag))
77
+ )
78
+ case _:
79
+ return None
@@ -0,0 +1,53 @@
1
+ from typing import final
2
+
3
+ from notte_core.browser.observation import Observation
4
+ from typing_extensions import override
5
+
6
+ from notte_agent.common.perception import BasePerception
7
+
8
+
9
+ @final
10
+ class GufoPerception(BasePerception):
11
+ @override
12
+ def perceive_metadata(self, obs: Observation) -> str:
13
+ space_description = obs.space.description
14
+ category: str = obs.space.category.value if obs.space.category is not None else ""
15
+ return f"""
16
+ Webpage information:
17
+ - URL: {obs.metadata.url}
18
+ - Title: {obs.metadata.title}
19
+ - Description: {space_description or "No description available"}
20
+ - Timestamp: {obs.metadata.timestamp.strftime("%Y-%m-%d %H:%M:%S")}
21
+ - Page category: {category or "No category available"}
22
+ """
23
+
24
+ @override
25
+ def perceive_data(
26
+ self,
27
+ obs: Observation,
28
+ ) -> str:
29
+ if not obs.has_data():
30
+ raise ValueError("No scraping data found")
31
+ return f"""
32
+ Here is some data that has been extracted from this page:
33
+ <data>
34
+ {obs.data.markdown if obs.data is not None else "No data available"}
35
+ </data>
36
+ """
37
+
38
+ @override
39
+ def perceive_actions(self, obs: Observation) -> str:
40
+ return f"""
41
+ Here are the available actions you can take on this page:
42
+ <actions>
43
+ {obs.space.markdown()}
44
+ </actions>
45
+ """
46
+
47
+ @override
48
+ def perceive(self, obs: Observation) -> str:
49
+ return f"""
50
+ {self.perceive_metadata(obs).strip()}
51
+ {self.perceive_data(obs).strip() if obs.has_data() else ""}
52
+ {self.perceive_actions(obs).strip()}
53
+ """
@@ -0,0 +1,61 @@
1
+ from pathlib import Path
2
+
3
+ import chevron
4
+
5
+ from notte_agent.gufo.parser import GufoParser
6
+
7
+ system_prompt_file = Path(__file__).parent / "system.md"
8
+
9
+
10
+ class GufoPrompt:
11
+ def __init__(self, parser: GufoParser):
12
+ self.parser: GufoParser = parser
13
+ self.system_prompt: str = system_prompt_file.read_text()
14
+
15
+ def system(self, task: str, url: str | None = None) -> str:
16
+ return chevron.render(self.system_prompt, {"task": task, "url": url or "the web"}, warn=True)
17
+
18
+ def env_rules(self) -> str:
19
+ return f"""
20
+ Hi there! I am the Notte web environment, and will help you navigate the internet.
21
+ # How it works:
22
+ * Provide me with a URL. I will respond with the actions you can take on that page.
23
+ * You are NOT allowed to provide me with more than one URL.
24
+ * Important: Make sure to use the **exact format** below when sending me a URL:
25
+ {self.parser.example_format("observe")}
26
+ > So, where would you like to go?
27
+ """
28
+
29
+ def completion_rules(self) -> str:
30
+ return f"""
31
+ # How to format your answer when you're done
32
+ ## Success answer
33
+ * If you're done, include you final answer in <{self.parser.done_tag}> tags.
34
+ * Don't forget to justify why your answer is correct and solves the task.
35
+ * Don't assume anything, just provide factual information backuped by the page you're on.
36
+ Format your answer as follows:
37
+ {self.parser.example_format("done")}
38
+
39
+ ## Error answer
40
+ * If you feel stuck, remember that you are also allowed to use `Special Browser Actions` at any time to:
41
+ * Go to a different url
42
+ * Go back to the previous page
43
+ * Refresh the current page
44
+ * Scrape data from the page
45
+ * Etc
46
+ * If you want to stop or you're unable to pursue your goal, format your answer as follows:
47
+ {self.parser.example_format("error")}
48
+ """
49
+
50
+ def select_action_rules(self) -> str:
51
+ return f"""
52
+ # Next Action Selection
53
+ * Provide me with the ID of the action you want to take next.
54
+ * You are allowed to take only exactly ONE action from the list.
55
+ * You are ONLY allowed to pick actions from the latest list of actions!
56
+ * You are NOT allowed to pick actions from list of actions in previous messages!
57
+ * If the action is parameterized, provide the value for each parameter.
58
+ Use the exact following format:
59
+
60
+ {self.parser.example_format("step")}
61
+ """
@@ -0,0 +1,8 @@
1
+ You are a helpful web agent.
2
+ Now you are given the task: {{task}}.
3
+ Please interact with : {{url}} to get the answer.
4
+
5
+ Instructions:
6
+ - At every step, you will be provided with a list of actions you can take.
7
+ - If you are asked to accept cookies to continue, please accept them. Accepting cookies is MANDATORY.
8
+ - If you see one action about cookie management, you should stop thinking about the goal and accept cookies DIRECTLY.