notte-agent 0.0.dev0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- notte_agent/README.md +58 -0
- notte_agent/__init__.py +7 -0
- notte_agent/common/__init__.py +0 -0
- notte_agent/common/base.py +14 -0
- notte_agent/common/captcha_detector.py +87 -0
- notte_agent/common/config.py +219 -0
- notte_agent/common/conversation.py +246 -0
- notte_agent/common/notifier.py +55 -0
- notte_agent/common/parser.py +78 -0
- notte_agent/common/perception.py +21 -0
- notte_agent/common/prompt.py +15 -0
- notte_agent/common/safe_executor.py +100 -0
- notte_agent/common/trajectory_history.py +100 -0
- notte_agent/common/types.py +41 -0
- notte_agent/common/validator.py +90 -0
- notte_agent/falco/__init__.py +0 -0
- notte_agent/falco/agent.py +343 -0
- notte_agent/falco/perception.py +83 -0
- notte_agent/falco/prompt.py +132 -0
- notte_agent/falco/prompts/system_prompt_multi_actions.md +107 -0
- notte_agent/falco/prompts/system_prompt_single_action.md +107 -0
- notte_agent/falco/trajectory_history.py +42 -0
- notte_agent/falco/types.py +132 -0
- notte_agent/gufo/__init__.py +0 -0
- notte_agent/gufo/agent.py +180 -0
- notte_agent/gufo/parser.py +79 -0
- notte_agent/gufo/perception.py +53 -0
- notte_agent/gufo/prompt.py +61 -0
- notte_agent/gufo/system.md +8 -0
- notte_agent/main.py +77 -0
- notte_agent/py.typed +0 -0
- notte_agent-0.0.dev0.dist-info/METADATA +8 -0
- notte_agent-0.0.dev0.dist-info/RECORD +34 -0
- notte_agent-0.0.dev0.dist-info/WHEEL +4 -0
|
@@ -0,0 +1,107 @@
|
|
|
1
|
+
You are a precise browser automation agent that interacts with websites through structured commands.
|
|
2
|
+
Your role is to:
|
|
3
|
+
1. Analyze the provided webpage elements and structure
|
|
4
|
+
2. Plan a sequence of actions to accomplish the given task
|
|
5
|
+
3. Respond with valid JSON containing your action sequence and state assessment
|
|
6
|
+
|
|
7
|
+
Current date and time: {{timstamp}}
|
|
8
|
+
|
|
9
|
+
INPUT STRUCTURE:
|
|
10
|
+
1. Current URL: The webpage you're currently on
|
|
11
|
+
2. Available Tabs: List of open browser tabs
|
|
12
|
+
3. Interactive Elements: List in the format:
|
|
13
|
+
id[:]<element_type>element_text</element_type>
|
|
14
|
+
- `id`: identifier for interaction. `ids` can be decomposed into `<role_first_letter><index>[:]` where `<index>` is the index of the element in the list of elements with the same role and `<role_first_letter>` are:
|
|
15
|
+
- `I` for input fields (textbox, select, checkbox, etc.)
|
|
16
|
+
- `B` for buttons
|
|
17
|
+
- `L` for links
|
|
18
|
+
- `F` for figures and images
|
|
19
|
+
- `O` for options in select elements
|
|
20
|
+
- `M` for miscallaneous elements (e.g. modals, dialogs, etc.) that are only clickable for the most part.
|
|
21
|
+
- `element_type`: HTML element type (button, input, etc.)
|
|
22
|
+
- `element_text`: Visible text or element description
|
|
23
|
+
|
|
24
|
+
Example:
|
|
25
|
+
B1[:]<button>Submit Form</button>
|
|
26
|
+
_[:] Non-interactive text
|
|
27
|
+
|
|
28
|
+
|
|
29
|
+
Notes:
|
|
30
|
+
- Only elements with `ids` are interactive
|
|
31
|
+
- `_[:]` elements provide context but cannot be interacted with
|
|
32
|
+
|
|
33
|
+
1. RESPONSE FORMAT: You must ALWAYS respond with valid JSON in this exact format:
|
|
34
|
+
```json
|
|
35
|
+
{{& example_step}}
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
|
|
39
|
+
2. ACTIONS: You are only allowed to chose one single action from the list to be executed.
|
|
40
|
+
|
|
41
|
+
You will find below some common actions sequences so that you can undertand the flow of some tasks.
|
|
42
|
+
IDS presented in those sequences correspond to interactionable elements found in the page.
|
|
43
|
+
You might encounter the same ids, but never assume them to exist, or have the same role.
|
|
44
|
+
|
|
45
|
+
Common action sequences:
|
|
46
|
+
- Form filling:
|
|
47
|
+
```json
|
|
48
|
+
{{& example_form_filling}}
|
|
49
|
+
```
|
|
50
|
+
- Navigation and extraction:
|
|
51
|
+
```json
|
|
52
|
+
{{& example_navigation_and_extraction}}
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
REMEMBER: You are NEVER allowed to specify multiple actions in the list of actions.
|
|
56
|
+
|
|
57
|
+
|
|
58
|
+
3. ELEMENT INTERACTION:
|
|
59
|
+
- Only use `ids` that exist in the provided element list
|
|
60
|
+
- Each element has a unique `id` (e.g., `I2[:]<button>`)
|
|
61
|
+
- Elements marked with `_[:]` are non-interactive (for context only)
|
|
62
|
+
|
|
63
|
+
4. NAVIGATION & ERROR HANDLING:
|
|
64
|
+
- If no suitable elements exist, use other functions to complete the task
|
|
65
|
+
- If stuck, try alternative approaches
|
|
66
|
+
- Handle popups/cookies by accepting or closing them
|
|
67
|
+
- Use scroll to find elements you are looking for
|
|
68
|
+
|
|
69
|
+
5. TASK COMPLETION:
|
|
70
|
+
- Use the `{{completion_action_name}}` action as the last action as soon as the task is complete
|
|
71
|
+
- Don't hallucinate actions
|
|
72
|
+
- If the task requires specific information - make sure to include everything in the `{{completion_action_name}}` function. This is what the user will see.
|
|
73
|
+
- If you are running out of steps (current step), think about speeding it up, and ALWAYS use the `{{completion_action_name}}` action as the last action.
|
|
74
|
+
- Note that the `{{completion_action_name}}` can fail because an external validator failed to validate the output. If this happens, you should reflect on why the output is invalid and try to fix it.
|
|
75
|
+
|
|
76
|
+
- Example of sucessfuly `{{completion_action_name}}` action:
|
|
77
|
+
```json
|
|
78
|
+
{{& completion_example}}
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
6. VISUAL CONTEXT:
|
|
82
|
+
- When an image is provided, use it to understand the page layout
|
|
83
|
+
- Bounding boxes with labels correspond to element indexes
|
|
84
|
+
- Each bounding box and its label have the same color
|
|
85
|
+
- Most often the label is inside the bounding box, on the top right
|
|
86
|
+
- Visual context helps verify element locations and relationships
|
|
87
|
+
- sometimes labels overlap, so use the context to verify the correct element
|
|
88
|
+
|
|
89
|
+
7. Form filling:
|
|
90
|
+
- If you fill an input field and your action sequence is interrupted, most often a list with suggestions popped up under the field and you need to first select the right element from the suggestion list.
|
|
91
|
+
|
|
92
|
+
8. ACTION SEQUENCING:
|
|
93
|
+
- Actions are executed in the order they appear in the list
|
|
94
|
+
- Each action should logically follow from the previous one
|
|
95
|
+
- If the page changes after an action, the sequence is interrupted and you get the new state.
|
|
96
|
+
- If content only disappears the sequence continues.
|
|
97
|
+
- Only provide the action sequence until you think the page will change.
|
|
98
|
+
- Try to be efficient, e.g. fill forms at once, or chain actions where nothing changes on the page like saving, extracting, checkboxes...
|
|
99
|
+
- NEVER use multiple actions in a single step (otherwise ONLY the first action will be executed)
|
|
100
|
+
|
|
101
|
+
9. Long tasks:
|
|
102
|
+
- If the task is long keep track of the status in the memory. If the ultimate task requires multiple subinformation, keep track of the status in the memory
|
|
103
|
+
|
|
104
|
+
Functions:
|
|
105
|
+
{{& action_description}}
|
|
106
|
+
|
|
107
|
+
Remember: Your responses must be valid JSON matching the specified format. Each action in the sequence must be valid.
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
from pydantic import Field
|
|
2
|
+
from typing_extensions import override
|
|
3
|
+
|
|
4
|
+
from notte_agent.common.trajectory_history import (
|
|
5
|
+
TrajectoryHistory,
|
|
6
|
+
TrajectoryStep,
|
|
7
|
+
)
|
|
8
|
+
from notte_agent.falco.types import StepAgentOutput
|
|
9
|
+
|
|
10
|
+
|
|
11
|
+
class FalcoTrajectoryHistory(TrajectoryHistory[StepAgentOutput]):
|
|
12
|
+
steps: list[TrajectoryStep[StepAgentOutput]] = Field(default_factory=list)
|
|
13
|
+
max_error_length: int | None = None
|
|
14
|
+
|
|
15
|
+
@override
|
|
16
|
+
def perceive_step(
|
|
17
|
+
self,
|
|
18
|
+
step: TrajectoryStep[StepAgentOutput],
|
|
19
|
+
step_idx: int = 0,
|
|
20
|
+
include_ids: bool = False,
|
|
21
|
+
include_data: bool = True,
|
|
22
|
+
) -> str:
|
|
23
|
+
action_msg = "\n".join([" - " + result.input.dump_str() for result in step.results])
|
|
24
|
+
status_msg = "\n".join(
|
|
25
|
+
[" - " + self.perceive_step_result(result, include_ids, include_data) for result in step.results]
|
|
26
|
+
)
|
|
27
|
+
return f"""
|
|
28
|
+
# Execution step {step_idx}
|
|
29
|
+
* state:
|
|
30
|
+
- page_summary: {step.agent_response.state.page_summary}
|
|
31
|
+
- previous_goal_status: {step.agent_response.state.previous_goal_status}
|
|
32
|
+
- previous_goal_eval: {step.agent_response.state.previous_goal_eval}
|
|
33
|
+
- memory: {step.agent_response.state.memory}
|
|
34
|
+
- next_goal: {step.agent_response.state.next_goal}
|
|
35
|
+
* selected actions:
|
|
36
|
+
{action_msg}
|
|
37
|
+
* execution results:
|
|
38
|
+
{status_msg}"""
|
|
39
|
+
|
|
40
|
+
@override
|
|
41
|
+
def add_output(self, output: StepAgentOutput) -> None:
|
|
42
|
+
self.steps.append(TrajectoryStep(agent_response=output, results=[]))
|
|
@@ -0,0 +1,132 @@
|
|
|
1
|
+
from typing import Any, Literal, TypeVar
|
|
2
|
+
|
|
3
|
+
from loguru import logger
|
|
4
|
+
from notte_core.controller.actions import BaseAction, ClickAction, CompletionAction
|
|
5
|
+
from notte_core.controller.space import ActionSpace
|
|
6
|
+
from pydantic import BaseModel, Field, create_model, field_serializer
|
|
7
|
+
|
|
8
|
+
|
|
9
|
+
class RelevantInteraction(BaseModel):
|
|
10
|
+
"""Interaction ids that can be relevant to the next actions"""
|
|
11
|
+
|
|
12
|
+
id: str
|
|
13
|
+
reason: str
|
|
14
|
+
|
|
15
|
+
|
|
16
|
+
class AgentState(BaseModel):
|
|
17
|
+
"""Current state of the agent"""
|
|
18
|
+
|
|
19
|
+
previous_goal_status: Literal["success", "failure", "unknown"]
|
|
20
|
+
previous_goal_eval: str
|
|
21
|
+
page_summary: str
|
|
22
|
+
relevant_interactions: list[RelevantInteraction]
|
|
23
|
+
memory: str
|
|
24
|
+
next_goal: str
|
|
25
|
+
|
|
26
|
+
|
|
27
|
+
# TODO: for later when we do a refactoring
|
|
28
|
+
class BetterAgentAction(BaseModel):
|
|
29
|
+
"""Base class for agent actions with explicit action handling"""
|
|
30
|
+
|
|
31
|
+
action_name: str
|
|
32
|
+
parameters: dict[str, str | int | bool | None]
|
|
33
|
+
|
|
34
|
+
@classmethod
|
|
35
|
+
def from_action(cls, action: BaseAction) -> "BetterAgentAction":
|
|
36
|
+
return cls(action_name=action.name(), parameters=action.model_dump(exclude={"category", "id"}))
|
|
37
|
+
|
|
38
|
+
def to_action(self, space: ActionSpace) -> BaseAction:
|
|
39
|
+
action_cls = space.action_map.get(self.action_name)
|
|
40
|
+
if not action_cls:
|
|
41
|
+
raise ValueError(f"Unknown action type: {self.action_name}")
|
|
42
|
+
return action_cls(**self.parameters) # type: ignore[arg-type]
|
|
43
|
+
|
|
44
|
+
|
|
45
|
+
class AgentAction(BaseModel):
|
|
46
|
+
def to_action(self) -> BaseAction:
|
|
47
|
+
field_sets = self.model_fields_set
|
|
48
|
+
if len(field_sets) != 1:
|
|
49
|
+
raise ValueError(f"Multiple actions found in {self.model_dump_json()}")
|
|
50
|
+
action_name = list(field_sets)[0]
|
|
51
|
+
return getattr(self, action_name)
|
|
52
|
+
|
|
53
|
+
|
|
54
|
+
def create_agent_action_model() -> type[AgentAction]:
|
|
55
|
+
"""Creates a Pydantic model from registered actions"""
|
|
56
|
+
space = ActionSpace(description="does not matter")
|
|
57
|
+
fields = {
|
|
58
|
+
name: (
|
|
59
|
+
ActionModel | None,
|
|
60
|
+
Field(default=None, description=ActionModel.model_json_schema()["properties"]["description"]["default"]),
|
|
61
|
+
)
|
|
62
|
+
for name, ActionModel in space.action_map.items()
|
|
63
|
+
}
|
|
64
|
+
return create_model(AgentAction.__name__, __base__=AgentAction, **fields) # type: ignore[call-overload]
|
|
65
|
+
|
|
66
|
+
|
|
67
|
+
TAgentAction = TypeVar("TAgentAction", bound=AgentAction)
|
|
68
|
+
|
|
69
|
+
_AgentAction: type[AgentAction] = create_agent_action_model()
|
|
70
|
+
|
|
71
|
+
|
|
72
|
+
class StepAgentOutput(BaseModel):
|
|
73
|
+
state: AgentState
|
|
74
|
+
actions: list[_AgentAction] = Field(min_length=1) # type: ignore[type-arg]
|
|
75
|
+
|
|
76
|
+
@field_serializer("actions")
|
|
77
|
+
def serialize_actions(self, actions: list[AgentAction], _info: Any) -> list[dict[str, Any]]:
|
|
78
|
+
return [action.to_action().dump_dict() for action in actions]
|
|
79
|
+
|
|
80
|
+
@property
|
|
81
|
+
def output(self) -> CompletionAction | None:
|
|
82
|
+
last_action: CompletionAction | None = getattr(self.actions[-1], CompletionAction.name()) # type: ignore[attr-defined]
|
|
83
|
+
if last_action is not None:
|
|
84
|
+
return CompletionAction(success=last_action.success, answer=last_action.answer)
|
|
85
|
+
return None
|
|
86
|
+
|
|
87
|
+
def get_actions(self, max_actions: int | None = None) -> list[BaseAction]:
|
|
88
|
+
actions: list[BaseAction] = []
|
|
89
|
+
# compute valid list of actions
|
|
90
|
+
raw_actions: list[AgentAction] = self.actions # type: ignore[type-assignment]
|
|
91
|
+
for i, _action in enumerate(raw_actions):
|
|
92
|
+
is_last = i == len(raw_actions) - 1
|
|
93
|
+
actions.append(_action.to_action())
|
|
94
|
+
if not is_last and max_actions is not None and i >= max_actions:
|
|
95
|
+
logger.warning(f"Max actions reached: {max_actions}. Skipping remaining actions.")
|
|
96
|
+
break
|
|
97
|
+
if not is_last and actions[-1].name() == ClickAction.name() and actions[-1].id.startswith("L"):
|
|
98
|
+
logger.warning(f"Removing all actions after link click: {actions[-1].id}")
|
|
99
|
+
# all actions after a link `L` should be removed from the list
|
|
100
|
+
break
|
|
101
|
+
return actions
|
|
102
|
+
|
|
103
|
+
def pretty_string(self, colors: bool = True) -> str:
|
|
104
|
+
status = self.state.previous_goal_status
|
|
105
|
+
status_emoji: str
|
|
106
|
+
match status:
|
|
107
|
+
case "unknown":
|
|
108
|
+
status_emoji = "❓"
|
|
109
|
+
case "success":
|
|
110
|
+
status_emoji = "✅"
|
|
111
|
+
case "failure":
|
|
112
|
+
status_emoji = "❌"
|
|
113
|
+
|
|
114
|
+
def surround_tags(s: str, tags: tuple[str, ...] = ("b", "blue")) -> str:
|
|
115
|
+
if not colors:
|
|
116
|
+
return s
|
|
117
|
+
|
|
118
|
+
start = "".join(f"<{tag}>" for tag in tags)
|
|
119
|
+
end = "".join(f"</{tag}>" for tag in reversed(tags))
|
|
120
|
+
return f"{start}{s}{end}"
|
|
121
|
+
|
|
122
|
+
action_str = ""
|
|
123
|
+
actions: list[AgentAction] = self.actions # type: ignore[reportUnkownMemberType]
|
|
124
|
+
for action in actions:
|
|
125
|
+
action_base: BaseAction = action.to_action()
|
|
126
|
+
action_str += f" ▶ {action_base.name()} with id {action_base.id}"
|
|
127
|
+
return f"""📝 {surround_tags("Current page:")} {self.state.page_summary}
|
|
128
|
+
🔬 {surround_tags("Previous goal:")} {status_emoji} {self.state.previous_goal_eval}
|
|
129
|
+
🧠 {surround_tags("Memory:")} {self.state.memory}
|
|
130
|
+
🎯 {surround_tags("Next goal:")} {self.state.next_goal}
|
|
131
|
+
⚡ {surround_tags("Taking action:")}
|
|
132
|
+
{action_str}"""
|
|
File without changes
|
|
@@ -0,0 +1,180 @@
|
|
|
1
|
+
from collections.abc import Callable
|
|
2
|
+
|
|
3
|
+
from loguru import logger
|
|
4
|
+
from notte_browser.dom.locate import locate_element
|
|
5
|
+
from notte_browser.resolution import NodeResolutionPipe
|
|
6
|
+
from notte_browser.session import NotteSession, NotteSessionConfig
|
|
7
|
+
from notte_browser.vault import VaultScreetsScreenshotMask
|
|
8
|
+
from notte_browser.window import BrowserWindow
|
|
9
|
+
from notte_core.browser.observation import Observation
|
|
10
|
+
from notte_core.common.tracer import LlmUsageDictTracer
|
|
11
|
+
from notte_core.controller.actions import CompletionAction, InteractionAction
|
|
12
|
+
from notte_core.credentials.base import BaseVault
|
|
13
|
+
from notte_core.llms.engine import LLMEngine
|
|
14
|
+
from patchright.async_api import Locator
|
|
15
|
+
from typing_extensions import override
|
|
16
|
+
|
|
17
|
+
from notte_agent.common.base import BaseAgent
|
|
18
|
+
from notte_agent.common.config import AgentConfig
|
|
19
|
+
from notte_agent.common.conversation import Conversation
|
|
20
|
+
from notte_agent.common.parser import NotteStepAgentOutput
|
|
21
|
+
from notte_agent.common.types import AgentResponse
|
|
22
|
+
from notte_agent.falco.agent import FalcoAgent
|
|
23
|
+
from notte_agent.gufo.parser import GufoParser
|
|
24
|
+
from notte_agent.gufo.perception import GufoPerception
|
|
25
|
+
from notte_agent.gufo.prompt import GufoPrompt
|
|
26
|
+
|
|
27
|
+
|
|
28
|
+
class GufoAgentConfig(AgentConfig):
|
|
29
|
+
@classmethod
|
|
30
|
+
@override
|
|
31
|
+
def default_session(cls) -> NotteSessionConfig:
|
|
32
|
+
return NotteSessionConfig().use_llm()
|
|
33
|
+
|
|
34
|
+
|
|
35
|
+
class GufoAgent(BaseAgent):
|
|
36
|
+
"""
|
|
37
|
+
A base agent implementation that coordinates between an LLM and the Notte environment.
|
|
38
|
+
|
|
39
|
+
This class demonstrates how to build an agent that can:
|
|
40
|
+
1. Maintain a conversation with an LLM
|
|
41
|
+
2. Execute actions in the Notte environment
|
|
42
|
+
3. Parse and format responses between the LLM and Notte
|
|
43
|
+
|
|
44
|
+
To customize this agent:
|
|
45
|
+
1. Implement your own Parser class to format observations and actions
|
|
46
|
+
2. Modify the conversation flow in the run() method
|
|
47
|
+
3. Adjust the think() method to handle LLM interactions
|
|
48
|
+
4. Customize the ask_notte() method for your specific needs
|
|
49
|
+
|
|
50
|
+
Args:
|
|
51
|
+
task (str): The task description for the agent
|
|
52
|
+
model (str): The LLM model identifier
|
|
53
|
+
max_steps (int): Maximum number of steps before terminating
|
|
54
|
+
headless (bool): Whether to run browser in headless mode
|
|
55
|
+
parser (Parser | None): Custom parser for formatting interactions
|
|
56
|
+
"""
|
|
57
|
+
|
|
58
|
+
def __init__(
|
|
59
|
+
self,
|
|
60
|
+
config: AgentConfig,
|
|
61
|
+
window: BrowserWindow | None = None,
|
|
62
|
+
vault: BaseVault | None = None,
|
|
63
|
+
step_callback: Callable[[str, NotteStepAgentOutput], None] | None = None,
|
|
64
|
+
) -> None:
|
|
65
|
+
super().__init__(session=NotteSession(config=config.session, window=window))
|
|
66
|
+
self.step_callback: Callable[[str, NotteStepAgentOutput], None] | None = step_callback
|
|
67
|
+
self.tracer: LlmUsageDictTracer = LlmUsageDictTracer()
|
|
68
|
+
self.config: AgentConfig = config
|
|
69
|
+
self.vault: BaseVault | None = vault
|
|
70
|
+
self.llm: LLMEngine = LLMEngine(
|
|
71
|
+
model=config.reasoning_model,
|
|
72
|
+
tracer=self.tracer,
|
|
73
|
+
structured_output_retries=config.session.structured_output_retries,
|
|
74
|
+
verbose=self.config.verbose,
|
|
75
|
+
)
|
|
76
|
+
# Users should implement their own parser to customize how observations
|
|
77
|
+
# and actions are formatted for their specific LLM and use case
|
|
78
|
+
self.parser: GufoParser = GufoParser()
|
|
79
|
+
self.prompt: GufoPrompt = GufoPrompt(self.parser)
|
|
80
|
+
self.perception: GufoPerception = GufoPerception()
|
|
81
|
+
self.conv: Conversation = Conversation()
|
|
82
|
+
|
|
83
|
+
if self.vault is not None:
|
|
84
|
+
# hide vault leaked credentials within llm inputs
|
|
85
|
+
self.llm.structured_completion = self.vault.patch_structured_completion(0, self.vault.get_replacement_map)(
|
|
86
|
+
self.llm.structured_completion
|
|
87
|
+
)
|
|
88
|
+
|
|
89
|
+
async def reset(self):
|
|
90
|
+
await self.session.reset()
|
|
91
|
+
self.conv.reset()
|
|
92
|
+
|
|
93
|
+
def output(self, answer: str, success: bool) -> AgentResponse:
|
|
94
|
+
return AgentResponse(
|
|
95
|
+
answer=answer,
|
|
96
|
+
success=success,
|
|
97
|
+
session_trajectory=self.session.trajectory,
|
|
98
|
+
agent_trajectory=[],
|
|
99
|
+
llm_usage=self.tracer.usage,
|
|
100
|
+
)
|
|
101
|
+
|
|
102
|
+
async def step(self, task: str) -> CompletionAction | None:
|
|
103
|
+
# Processes the conversation history through the LLM to decide the next action.
|
|
104
|
+
# logger.info(f"🤖 LLM prompt:\n{self.conv.messages()}")
|
|
105
|
+
response: str = self.llm.single_completion(self.conv.messages())
|
|
106
|
+
self.conv.add_assistant_message(content=response)
|
|
107
|
+
logger.info(f"🤖 LLM response:\n{response}")
|
|
108
|
+
# Ask Notte to perform the selected action
|
|
109
|
+
parsed_response = self.parser.parse(response)
|
|
110
|
+
|
|
111
|
+
if parsed_response is None or parsed_response.action is None:
|
|
112
|
+
self.conv.add_user_message(content=self.prompt.env_rules())
|
|
113
|
+
return None
|
|
114
|
+
|
|
115
|
+
if self.step_callback is not None:
|
|
116
|
+
self.step_callback(task, parsed_response)
|
|
117
|
+
|
|
118
|
+
if parsed_response.completion is not None:
|
|
119
|
+
return parsed_response.completion
|
|
120
|
+
action = parsed_response.action
|
|
121
|
+
# Replace credentials if needed using the vault
|
|
122
|
+
if self.vault is not None and self.vault.contains_credentials(action):
|
|
123
|
+
action_with_selector = await NodeResolutionPipe.forward(action, self.session.snapshot)
|
|
124
|
+
|
|
125
|
+
if isinstance(action_with_selector, InteractionAction) and action_with_selector.selector is not None:
|
|
126
|
+
locator: Locator = await locate_element(self.session.window.page, action_with_selector.selector)
|
|
127
|
+
attrs = await FalcoAgent.compute_locator_attributes(locator)
|
|
128
|
+
|
|
129
|
+
assert isinstance(action_with_selector, InteractionAction) and action_with_selector.selector is not None
|
|
130
|
+
|
|
131
|
+
action = self.vault.replace_credentials(
|
|
132
|
+
action,
|
|
133
|
+
attrs,
|
|
134
|
+
self.session.snapshot,
|
|
135
|
+
)
|
|
136
|
+
# Execute the action
|
|
137
|
+
obs: Observation = await self.session.act(action)
|
|
138
|
+
text_obs = self.perception.perceive(obs)
|
|
139
|
+
self.conv.add_user_message(
|
|
140
|
+
content=f"""
|
|
141
|
+
{text_obs}
|
|
142
|
+
{self.prompt.select_action_rules()}
|
|
143
|
+
{self.prompt.completion_rules()}
|
|
144
|
+
""",
|
|
145
|
+
image=obs.screenshot if self.config.include_screenshot else None,
|
|
146
|
+
)
|
|
147
|
+
logger.info(f"🌌 Action successfully executed:\n{text_obs}")
|
|
148
|
+
return None
|
|
149
|
+
|
|
150
|
+
@override
|
|
151
|
+
async def run(self, task: str, url: str | None = None) -> AgentResponse:
|
|
152
|
+
"""
|
|
153
|
+
Main execution loop that coordinates between the LLM and Notte environment.
|
|
154
|
+
|
|
155
|
+
This method shows a basic conversation flow. Consider customizing:
|
|
156
|
+
1. The initial system prompt
|
|
157
|
+
2. How observations are added to the conversation
|
|
158
|
+
3. When and how to determine task completion
|
|
159
|
+
4. Error handling and recovery strategies
|
|
160
|
+
"""
|
|
161
|
+
logger.info(f"🚀 starting agent with task: {task} and url: {url}")
|
|
162
|
+
system_msg = self.prompt.system(task, url)
|
|
163
|
+
if self.vault is not None:
|
|
164
|
+
system_msg += "\n" + self.vault.instructions()
|
|
165
|
+
self.conv.add_system_message(content=system_msg)
|
|
166
|
+
self.conv.add_user_message(self.prompt.env_rules())
|
|
167
|
+
async with self.session:
|
|
168
|
+
if self.vault is not None:
|
|
169
|
+
self.session.window.screenshot_mask = VaultScreetsScreenshotMask(vault=self.vault)
|
|
170
|
+
for i in range(self.config.session.max_steps):
|
|
171
|
+
logger.info(f"> step {i}: looping in")
|
|
172
|
+
output = await self.step(task=task)
|
|
173
|
+
if output is not None:
|
|
174
|
+
status = "😎 task completed sucessfully" if output.success else "👿 task failed"
|
|
175
|
+
logger.info(f"{status} with answer: {output.answer}")
|
|
176
|
+
return self.output(output.answer, output.success)
|
|
177
|
+
# If the task is not done, raise an error
|
|
178
|
+
error_msg = f"Failed to solve task in {self.config.session.max_steps} steps"
|
|
179
|
+
logger.info(f"🚨 {error_msg}")
|
|
180
|
+
return self.output(error_msg, False)
|
|
@@ -0,0 +1,79 @@
|
|
|
1
|
+
from typing import ClassVar, Literal
|
|
2
|
+
|
|
3
|
+
from notte_core.actions.base import ActionParameterValue, ExecutableAction
|
|
4
|
+
from notte_core.controller.actions import CompletionAction, GotoAction, ScrapeAction
|
|
5
|
+
from typing_extensions import override
|
|
6
|
+
|
|
7
|
+
from notte_agent.common.parser import BaseParser, NotteStepAgentOutput
|
|
8
|
+
|
|
9
|
+
|
|
10
|
+
class GufoParser(BaseParser):
|
|
11
|
+
observe_tag: ClassVar[str] = "observe"
|
|
12
|
+
step_tag: ClassVar[str] = "execute-action"
|
|
13
|
+
scrape_tag: ClassVar[str] = "scrape-data"
|
|
14
|
+
done_tag: ClassVar[str] = "done"
|
|
15
|
+
|
|
16
|
+
@override
|
|
17
|
+
def example_format(self, endpoint: Literal["observe", "step", "scrape", "done", "error"]) -> str | None:
|
|
18
|
+
match endpoint:
|
|
19
|
+
case "observe":
|
|
20
|
+
return f"""
|
|
21
|
+
<{self.observe_tag}>
|
|
22
|
+
{GotoAction(url="https://www.example.com").dump_str(name=False)}
|
|
23
|
+
</{self.observe_tag}>
|
|
24
|
+
"""
|
|
25
|
+
case "step":
|
|
26
|
+
return f"""
|
|
27
|
+
<{self.step_tag}>
|
|
28
|
+
{
|
|
29
|
+
ExecutableAction(
|
|
30
|
+
id="<YOUR_ACTION_ID>",
|
|
31
|
+
params_values=[ActionParameterValue(name="<YOUR_PARAM_NAME>", value="<YOUR_PARAM_VALUE>")],
|
|
32
|
+
).dump_str(name=False)
|
|
33
|
+
}
|
|
34
|
+
</{self.step_tag}>
|
|
35
|
+
"""
|
|
36
|
+
case "scrape":
|
|
37
|
+
return f"""
|
|
38
|
+
<{self.scrape_tag}>
|
|
39
|
+
{ScrapeAction(instructions="<YOUR_SCRAPING_INSTRUCTIONS | null to scrape the whole page>").dump_str(name=False)}
|
|
40
|
+
</{self.scrape_tag}>
|
|
41
|
+
"""
|
|
42
|
+
case "done":
|
|
43
|
+
return f"""
|
|
44
|
+
<{self.done_tag}>
|
|
45
|
+
{CompletionAction(success=True, answer="<YOUR_ANSWER>").dump_str(name=False)}
|
|
46
|
+
</{self.done_tag}>
|
|
47
|
+
"""
|
|
48
|
+
case "error":
|
|
49
|
+
return f"""
|
|
50
|
+
<{self.done_tag}>
|
|
51
|
+
{CompletionAction(success=False, answer="<REASON_FOR_FAILURE>").dump_str(name=False)}
|
|
52
|
+
</{self.done_tag}>
|
|
53
|
+
"""
|
|
54
|
+
|
|
55
|
+
@override
|
|
56
|
+
def parse(self, text: str) -> NotteStepAgentOutput | None:
|
|
57
|
+
url = self.search_pattern(text, GufoParser.observe_tag)
|
|
58
|
+
action = self.search_pattern(text, GufoParser.step_tag)
|
|
59
|
+
scrape = self.search_pattern(text, GufoParser.scrape_tag)
|
|
60
|
+
output = self.search_pattern(text, GufoParser.done_tag)
|
|
61
|
+
match (bool(url), bool(action), bool(scrape), bool(output)):
|
|
62
|
+
case (True, False, False, False):
|
|
63
|
+
return NotteStepAgentOutput(
|
|
64
|
+
observe=GotoAction.model_validate(self.parse_json(text, GufoParser.observe_tag))
|
|
65
|
+
)
|
|
66
|
+
case (False, True, False, False):
|
|
67
|
+
return NotteStepAgentOutput(
|
|
68
|
+
step=ExecutableAction.model_validate(self.parse_json(text, GufoParser.step_tag)),
|
|
69
|
+
)
|
|
70
|
+
case (False, False, True, False):
|
|
71
|
+
return NotteStepAgentOutput(
|
|
72
|
+
scrape=ScrapeAction.model_validate(self.parse_json(text, GufoParser.scrape_tag))
|
|
73
|
+
)
|
|
74
|
+
case (False, False, False, True):
|
|
75
|
+
return NotteStepAgentOutput(
|
|
76
|
+
completion=CompletionAction.model_validate(self.parse_json(text, GufoParser.done_tag))
|
|
77
|
+
)
|
|
78
|
+
case _:
|
|
79
|
+
return None
|
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
from typing import final
|
|
2
|
+
|
|
3
|
+
from notte_core.browser.observation import Observation
|
|
4
|
+
from typing_extensions import override
|
|
5
|
+
|
|
6
|
+
from notte_agent.common.perception import BasePerception
|
|
7
|
+
|
|
8
|
+
|
|
9
|
+
@final
|
|
10
|
+
class GufoPerception(BasePerception):
|
|
11
|
+
@override
|
|
12
|
+
def perceive_metadata(self, obs: Observation) -> str:
|
|
13
|
+
space_description = obs.space.description
|
|
14
|
+
category: str = obs.space.category.value if obs.space.category is not None else ""
|
|
15
|
+
return f"""
|
|
16
|
+
Webpage information:
|
|
17
|
+
- URL: {obs.metadata.url}
|
|
18
|
+
- Title: {obs.metadata.title}
|
|
19
|
+
- Description: {space_description or "No description available"}
|
|
20
|
+
- Timestamp: {obs.metadata.timestamp.strftime("%Y-%m-%d %H:%M:%S")}
|
|
21
|
+
- Page category: {category or "No category available"}
|
|
22
|
+
"""
|
|
23
|
+
|
|
24
|
+
@override
|
|
25
|
+
def perceive_data(
|
|
26
|
+
self,
|
|
27
|
+
obs: Observation,
|
|
28
|
+
) -> str:
|
|
29
|
+
if not obs.has_data():
|
|
30
|
+
raise ValueError("No scraping data found")
|
|
31
|
+
return f"""
|
|
32
|
+
Here is some data that has been extracted from this page:
|
|
33
|
+
<data>
|
|
34
|
+
{obs.data.markdown if obs.data is not None else "No data available"}
|
|
35
|
+
</data>
|
|
36
|
+
"""
|
|
37
|
+
|
|
38
|
+
@override
|
|
39
|
+
def perceive_actions(self, obs: Observation) -> str:
|
|
40
|
+
return f"""
|
|
41
|
+
Here are the available actions you can take on this page:
|
|
42
|
+
<actions>
|
|
43
|
+
{obs.space.markdown()}
|
|
44
|
+
</actions>
|
|
45
|
+
"""
|
|
46
|
+
|
|
47
|
+
@override
|
|
48
|
+
def perceive(self, obs: Observation) -> str:
|
|
49
|
+
return f"""
|
|
50
|
+
{self.perceive_metadata(obs).strip()}
|
|
51
|
+
{self.perceive_data(obs).strip() if obs.has_data() else ""}
|
|
52
|
+
{self.perceive_actions(obs).strip()}
|
|
53
|
+
"""
|
|
@@ -0,0 +1,61 @@
|
|
|
1
|
+
from pathlib import Path
|
|
2
|
+
|
|
3
|
+
import chevron
|
|
4
|
+
|
|
5
|
+
from notte_agent.gufo.parser import GufoParser
|
|
6
|
+
|
|
7
|
+
system_prompt_file = Path(__file__).parent / "system.md"
|
|
8
|
+
|
|
9
|
+
|
|
10
|
+
class GufoPrompt:
|
|
11
|
+
def __init__(self, parser: GufoParser):
|
|
12
|
+
self.parser: GufoParser = parser
|
|
13
|
+
self.system_prompt: str = system_prompt_file.read_text()
|
|
14
|
+
|
|
15
|
+
def system(self, task: str, url: str | None = None) -> str:
|
|
16
|
+
return chevron.render(self.system_prompt, {"task": task, "url": url or "the web"}, warn=True)
|
|
17
|
+
|
|
18
|
+
def env_rules(self) -> str:
|
|
19
|
+
return f"""
|
|
20
|
+
Hi there! I am the Notte web environment, and will help you navigate the internet.
|
|
21
|
+
# How it works:
|
|
22
|
+
* Provide me with a URL. I will respond with the actions you can take on that page.
|
|
23
|
+
* You are NOT allowed to provide me with more than one URL.
|
|
24
|
+
* Important: Make sure to use the **exact format** below when sending me a URL:
|
|
25
|
+
{self.parser.example_format("observe")}
|
|
26
|
+
> So, where would you like to go?
|
|
27
|
+
"""
|
|
28
|
+
|
|
29
|
+
def completion_rules(self) -> str:
|
|
30
|
+
return f"""
|
|
31
|
+
# How to format your answer when you're done
|
|
32
|
+
## Success answer
|
|
33
|
+
* If you're done, include you final answer in <{self.parser.done_tag}> tags.
|
|
34
|
+
* Don't forget to justify why your answer is correct and solves the task.
|
|
35
|
+
* Don't assume anything, just provide factual information backuped by the page you're on.
|
|
36
|
+
Format your answer as follows:
|
|
37
|
+
{self.parser.example_format("done")}
|
|
38
|
+
|
|
39
|
+
## Error answer
|
|
40
|
+
* If you feel stuck, remember that you are also allowed to use `Special Browser Actions` at any time to:
|
|
41
|
+
* Go to a different url
|
|
42
|
+
* Go back to the previous page
|
|
43
|
+
* Refresh the current page
|
|
44
|
+
* Scrape data from the page
|
|
45
|
+
* Etc
|
|
46
|
+
* If you want to stop or you're unable to pursue your goal, format your answer as follows:
|
|
47
|
+
{self.parser.example_format("error")}
|
|
48
|
+
"""
|
|
49
|
+
|
|
50
|
+
def select_action_rules(self) -> str:
|
|
51
|
+
return f"""
|
|
52
|
+
# Next Action Selection
|
|
53
|
+
* Provide me with the ID of the action you want to take next.
|
|
54
|
+
* You are allowed to take only exactly ONE action from the list.
|
|
55
|
+
* You are ONLY allowed to pick actions from the latest list of actions!
|
|
56
|
+
* You are NOT allowed to pick actions from list of actions in previous messages!
|
|
57
|
+
* If the action is parameterized, provide the value for each parameter.
|
|
58
|
+
Use the exact following format:
|
|
59
|
+
|
|
60
|
+
{self.parser.example_format("step")}
|
|
61
|
+
"""
|
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
You are a helpful web agent.
|
|
2
|
+
Now you are given the task: {{task}}.
|
|
3
|
+
Please interact with : {{url}} to get the answer.
|
|
4
|
+
|
|
5
|
+
Instructions:
|
|
6
|
+
- At every step, you will be provided with a list of actions you can take.
|
|
7
|
+
- If you are asked to accept cookies to continue, please accept them. Accepting cookies is MANDATORY.
|
|
8
|
+
- If you see one action about cookie management, you should stop thinking about the goal and accept cookies DIRECTLY.
|