web_browser.py

Classes

BrowseWebWithImagePromptsBot

BrowseWebWithImagePromptsBot is a base class for creating bots that interact with web pages using Playwright. The bot can perform actions such as navigating, clicking, scrolling, typing text, and taking screenshots. It also supports cookie management to maintain session state across interactions.

Methods:

init(session_id, website_name, browser_type='chromium', headless=True): Initializes the bot with the given session ID, website name, browser type, and headless mode. Supported browser types: 'chromium', 'firefox', 'webkit'.
load_cookies(): Loads cookies from a file and adds them to the browser context.
save_cookies(): Saves the current cookies to a file.
navigate(url): Navigates to the specified URL.
click(selector): Clicks on the element specified by the selector.
scroll(direction='down', amount=1): Scrolls the page in the specified direction ('down', 'up', 'left', 'right') by the specified amount.
type_text(selector, text): Types the specified text into the element specified by the selector.
take_screenshot(): Takes a screenshot and saves it with a timestamp in the session-specific directory. Returns the path to the screenshot.
get_latest_screenshot_path(): Retrieves the path to the most recent screenshot in the session-specific directory.
create_prompt_vars(current_action_description, session_goal): Creates a dictionary of prompt variables from the current action description and session goal.
send_screenshot_to_llm(screenshot_path, current_action_description="", session_goal=""): Encodes the screenshot in base64, creates prompt variables, and sends them to the LLM. Returns the new instructions from the LLM.
send_prompt_to_llm(prompt_vars, screenshot_base64): Abstract method to be implemented by subclasses. Sends the prompt variables and screenshot to the LLM and returns the response.
close(): Saves cookies, closes the browser, and stops Playwright.
execute_instructions(instructions): Executes the given set of instructions, takes a screenshot after each step, and sends the screenshot to the LLM for further instructions.

Example usage:

class ProductionBot(BrowseWebWithImagePromptsBot):
    def send_prompt_to_llm(self, prompt_vars, screenshot_base64):
        # Implement the actual logic to send the prompt and screenshot to the LLM and return the response
        api_url = "https://api.example.com/process"  # Replace with the actual LLM API endpoint
        headers = {"Content-Type": "application/json"}
        data = {
            "prompt": prompt_vars,
            "screenshot": screenshot_base64
        }
        response = requests.post(api_url, headers=headers, data=json.dumps(data))
        return response.text  # Assuming the response is in JSON format

@app.route('/run-bot', methods=['POST'])
def run_bot():
    data = request.json
    session_id = data.get('session_id')
    website_name = data.get('website_name')
    browser_type = data.get('browser_type', 'chromium')
    current_action_description = data.get('current_action_description', "")
    session_goal = data.get('session_goal', "")
    
    bot = ProductionBot(session_id=session_id, website_name=website_name, browser_type=browser_type, headless=True)
    
    # Check if initial instructions are provided
    initial_instructions = data.get('instructions')
    if initial_instructions:
        bot.execute_instructions(initial_instructions)
    else:
        bot.execute_instructions([{'action':'navigate', 'url': website_name}])
    
    # Take initial screenshot and send to LLM
    screenshot_path = bot.take_screenshot()
    new_instructions = bot.send_screenshot_to_llm(screenshot_path, current_action_description, session_goal)
    bot.execute_instructions(new_instructions)
    
    # Take final screenshot
    bot.take_screenshot()
    
    bot.close()
    
    return jsonify({"status": "completed", "new_instructions": new_instructions})

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=8080)

init(self, website_name: str, session_id: str = None, browser_type: str = 'chromium', headless: bool = True, max_steps: int = 10)
- Initialize self. See help(type(self)) for accurate signature.
check_llm_response(self, response)
- No docstring available.
click(self, selector)
- No docstring available.
close(self)
- No docstring available.
create_gif_from_pngs(self, frame_duration=300)
- Creates a GIF from a folder of PNG images.

Args: folder_path (str): The path to the folder containing PNG images. output_gif_path (str): The path where the output GIF will be saved. duration (int): Duration between frames in milliseconds.

Example: create_gif_from_pngs('/path/to/png_folder', '/path/to/output.gif', duration=500)

create_prompt_vars(self, last_message)
- No docstring available.
execute_custom_command(self, command)
- Executes a custom command on the page object.

Args: command (str): The command string to be executed.

execute_instructions(self, instructions: list, last_message: str = None)
- No docstring available.
get_latest_screenshot_path(self)
- No docstring available.
get_locator(self, selector, by_text=True)
- No docstring available.
get_locator_via_roles_and_placeholder(self, selector: str)
- No docstring available.
get_locator_via_roles_and_text(self, selector: str)
- No docstring available.
load_action_log(self)
- No docstring available.
load_cookies(self)
- No docstring available.
mark_screenshot(self, screenshot_bytes, mark_action)
- Marks the screenshot with the specified action.

Parameters: screenshot_bytes (bytes): The bytes of the screenshot. mark_action (dict): Action details for marking the screenshot.

navigate(self, url)
- No docstring available.
parse_element_part(self, element_part)
- Parses the element_part string to extract the method name and its parameters.

Args: element_part (str): The element part string (e.g., "get_by_role('button')")

Returns: tuple: A tuple containing the method name and a list of parameters.

save_action_log(self)
- No docstring available.
save_cookies(self)
- No docstring available.
scroll(self, direction='down', amount=100)
- No docstring available.
send_prompt_to_llm(self, prompt_vars, screenshot_base64)
- No docstring available.
send_screenshot_to_llm(self, screenshot_bytes, last_message)
- No docstring available.
start_session(self, instructions, session_goal)
- No docstring available.
take_screenshot(self, full_page=False, mark_action=None)
- No docstring available.
type_text(self, selector, text)
- No docstring available.

web_browser.py

Classes​

BrowseWebWithImagePromptsBot​

Classes

BrowseWebWithImagePromptsBot