Tag: ai agent

  • Never Start From Scratch: Persistent Browser Sessions for AI Agents

    Never Start From Scratch: Persistent Browser Sessions for AI Agents

    Building AI agents that interact with the web presents unique challenges. One of the most frustrating is the lack of persistent browser session for ai. Imagine an AI assistant that has to log in to a website every time it needs to perform a task. This repetitive process is not only time-consuming but also disrupts the flow of information and can lead to errors. Fortunately, there’s a solution: maintaining persistent browser sessions for your AI agents.

    The Problem with Stateless AI Web Interactions

    Without a persistent browser session, each interaction with a website is treated as a brand new visit. This means your AI agent loses all previous context, including login credentials, cookies, and browsing history. This “stateless” approach forces the agent to start from scratch each time, leading to:

    • Repetitive Logins: Constant login prompts hinder automation and slow down processes.
    • Loss of Context: Crucial information from previous interactions is lost, impacting the agent’s ability to perform complex tasks.
    • Inefficient Resource Use: Repeatedly loading websites and resources consumes unnecessary time and computing power.
    • Repetitive Logins: Constant login prompts hinder automation and slow down processes.
    • Loss of Context: Crucial information from previous interactions is lost, impacting the agent’s ability to perform complex tasks.
    • Inefficient Resource Use: Repeatedly loading websites and resources consumes unnecessary time and computing power.

    The Power of Persistent Browser Sessions for AI

    persistent browser session for ai allows your agent to maintain a continuous connection with a website, preserving its state across multiple interactions. This means:

    • Eliminate Repetitive Logins: Your AI agent stays logged in, ready to perform tasks without interruption.
    • Preserve Context: Retain crucial information like cookies, browsing history, and form data for seamless task execution.
    • Streamline Workflow: Enable complex, multi-step automation without constantly restarting the process. This is crucial for tasks like web scraping, data extraction, and automated testing.

    How Browser-Use Enables Persistent Sessions

    Browser-Use offers a powerful solution for managing persistent browser context for ai. By leveraging its features, you can easily create and maintain browser sessions, allowing your AI agents to operate with maximum efficiency. This functionality is especially beneficial for long-running ai browser sessions that require continuous interaction with web applications.

    Installation Guide

    Prerequisites

    • Python 3.11 or higher
    • Git (for cloning the repository)

    Option 1: Local Installation

    Read the quickstart guide or follow the steps below to get started.

    Step 1: Clone the Repository

    git clone https://github.com/browser-use/web-ui.git
    cd web-ui

    Step 2: Set Up Python Environment

    We recommend using uv for managing the Python environment.

    Using uv (recommended):

    uv venv --python 3.11

    Activate the virtual environment:

    • Windows (Command Prompt):
    .venv\Scripts\activate
    • Windows (PowerShell):
    .\.venv\Scripts\Activate.ps1
    • macOS/Linux:
    source .venv/bin/activate

    Step 3: Install Dependencies

    Install Python packages:

    uv pip install -r requirements.txt

    Install Playwright:

    playwright install

    Step 4: Configure Environment

    1. Create a copy of the example environment file:
    • Windows (Command Prompt):
    copy .env.example .env
    • macOS/Linux/Windows (PowerShell):
    cp .env.example .env
    1. Open .env in your preferred text editor and add your API keys and other settings

    Option 2: Docker Installation

    Prerequisites

    Installation Steps

    1. Clone the repository:
    git clone https://github.com/browser-use/web-ui.git
    cd web-ui
    1. Create and configure environment file:
    • Windows (Command Prompt):
    copy .env.example .env
    • macOS/Linux/Windows (PowerShell):
    cp .env.example .env

    Edit .env with your preferred text editor and add your API keys

    1. Run with Docker:
    # Build and start the container with default settings (browser closes after AI tasks)
    docker compose up --build
    # Or run with persistent browser (browser stays open between AI tasks)
    CHROME_PERSISTENT_SESSION=true docker compose up --build
    1. Access the Application:
    • Web Interface: Open http://localhost:7788 in your browser
    • VNC Viewer (for watching browser interactions): Open http://localhost:6080/vnc.html
      • Default VNC password: “youvncpassword”
      • Can be changed by setting VNC_PASSWORD in your .env file

    Docker Setup

    Environment Variables:

    All configuration is done through the .env file

    Available environment variables:

    # LLM API Keys
    OPENAI_API_KEY=your_key_here
    ANTHROPIC_API_KEY=your_key_here
    GOOGLE_API_KEY=your_key_here
    
    # Browser Settings
    CHROME_PERSISTENT_SESSION=true   # Set to true to keep browser open between AI tasks
    RESOLUTION=1920x1080x24         # Custom resolution format: WIDTHxHEIGHTxDEPTH
    RESOLUTION_WIDTH=1920           # Custom width in pixels
    RESOLUTION_HEIGHT=1080          # Custom height in pixels
    
    # VNC Settings
    VNC_PASSWORD=your_vnc_password  # Optional, defaults to "vncpassword"

    Platform Support:

    Supports both AMD64 and ARM64 architectures

    For ARM64 systems (e.g., Apple Silicon Macs), the container will automatically use the appropriate image

    Browser Persistence Modes:

    Default Mode (CHROME_PERSISTENT_SESSION=false):

    Browser opens and closes with each AI task

    Clean state for each interaction

    Lower resource usage

    Persistent Mode (CHROME_PERSISTENT_SESSION=true):

    Browser stays open between AI tasks

    Maintains history and state

    Allows viewing previous AI interactions

    Set in .env file or via environment variable when starting container

    Viewing Browser Interactions:

    Access the noVNC viewer at http://localhost:6080/vnc.html

    Enter the VNC password (default: “vncpassword” or what you set in VNC_PASSWORD)

    Direct VNC access available on port 5900 (mapped to container port 5901)

    You can now see all browser interactions in real-time

    Persistent browser sessions are essential for building efficient and robust AI agents that interact with the web. By eliminating repetitive logins, preserving context, and streamlining workflows, you can unlock the true potential of AI web automation. Explore Browser-Use and discover how its persistent session management can revolutionize your AI development process. Start building smarter, more efficient AI agents today!

  • How to add custom actions and skills in Eliza AI?

    How to add custom actions and skills in Eliza AI?

    Eliza is a versatile multi-agent simulation framework, built in TypeScript, that allows you to create sophisticated, autonomous AI agents. These agents can interact across multiple platforms while maintaining consistent personalities and knowledge. A key feature that enables this flexibility is the ability to define custom actions and skills. This article will delve into how you can leverage this feature to make your Eliza agents even more powerful.

    Understanding Actions in Eliza

    Actions are the fundamental building blocks that dictate how Eliza agents respond to and interact with messages. They allow agents to go beyond simple text replies, enabling them to:

    add actions and skills in Eliza
    • Interact with external systems.
    • Modify their behavior dynamically.
    • Perform complex tasks.

    Each action in Eliza consists of several key components:

    • name: A unique identifier for the action.
    • similes: Alternative names or triggers that can invoke the action.
    • description: A detailed explanation of what the action does.
    • validate: A function that checks if the action is appropriate to execute in the current context.
    • handler: The implementation of the action’s behavior – the core logic that the action performs.
    • examples: Demonstrates proper usage patterns
    • suppressInitialMessage: When set to true, it prevents the initial message from being sent before processing the action.

    Built-in Actions

    Eliza includes several built-in actions to manage basic conversation flow and external integrations:

    • CONTINUE: Keeps a conversation going when more context is required.
    • IGNORE: Gracefully disengages from a conversation.
    • NONE: Default action for standard conversational replies.
    • TAKE_ORDER: Records and processes user purchase orders (primarily for Solana integration).

    Creating Custom Actions: Expanding Eliza’s Capabilities

    The power of Eliza truly shines when you start implementing custom actions and skills. Here’s how to create them:

    1. Create a custom_actions directory: This is where you’ll store your action files.
    2. Add your action files: Each action is defined in its own TypeScript file, implementing the Action interface.
    3. Configure in elizaConfig.yaml: Point to your custom actions by adding entries under the actions key.
    actions:
        - name: myCustomAction
          path: ./custom_actions/myAction.ts

    Action Configuration Structure

    Here’s an example of how to structure your action file:

    import { Action, IAgentRuntime, Memory } from "@elizaos/core";
    
    export const myAction: Action = {
        name: "MY_ACTION",
        similes: ["SIMILAR_ACTION", "ALTERNATE_NAME"],
        validate: async (runtime: IAgentRuntime, message: Memory) => {
            // Validation logic here
            return true;
        },
        description: "A detailed description of your action.",
        handler: async (runtime: IAgentRuntime, message: Memory) => {
            // The actual logic of your action
            return true;
        },
    };

    Implementing a Custom Action

    • Validation: Before executing an action, the validate function is called to determine if it can proceed, it checks if all the prerequisites are met to execute a specific action.
    • Handler: The handler function contains the core logic of the action. It interacts with the agent runtime and memory and also perform the desired tasks, such as calling external APIs, processing data, or generating output.

    Examples of Custom Actions

    Here are some examples to illustrate the possibilities:

    Basic Action Template:

    const customAction: Action = {
        name: "CUSTOM_ACTION",
        similes: ["SIMILAR_ACTION"],
        description: "Action purpose",
        validate: async (runtime: IAgentRuntime, message: Memory) => {
            // Validation logic
            return true;
        },
        handler: async (runtime: IAgentRuntime, message: Memory) => {
            // Implementation
        },
        examples: [],
    };

    Advanced Action Example: Processing Documents:

    const complexAction: Action = {
        name: "PROCESS_DOCUMENT",
        similes: ["READ_DOCUMENT", "ANALYZE_DOCUMENT"],
        description: "Process and analyze uploaded documents",
        validate: async (runtime, message) => {
            const hasAttachment = message.content.attachments?.length > 0;
            const supportedTypes = ["pdf", "txt", "doc"];
            return (
                hasAttachment &&
                supportedTypes.includes(message.content.attachments[0].type)
            );
        },
        handler: async (runtime, message, state) => {
            const attachment = message.content.attachments[0];
    
            // Process document
            const content = await runtime
                .getService<IDocumentService>(ServiceType.DOCUMENT)
                .processDocument(attachment);
    
            // Store in memory
            await runtime.documentsManager.createMemory({
                id: generateId(),
                content: { text: content },
                userId: message.userId,
                roomId: message.roomId,
            });
    
            return true;
        },
    };

    Best Practices for Custom Actions

    • Single Responsibility: Ensure each action has a single, well-defined purpose.
    • Robust Validation: Always validate inputs and preconditions before executing an action.
    • Clear Error Handling: Implement error catching and provide informative error messages.
    • Detailed Examples: Include examples in the examples field to show the action’s usage.

    Testing Your Actions

    Eliza provides a built-in testing framework to validate your actions:

    test("Validate action behavior", async () => {
        const message: Memory = {
            userId: user.id,
            content: { text: "Test message" },
            roomId,
        };
    
        const response = await handleMessage(runtime, message);
        // Verify response
    });

    Custom actions and skills are crucial for unlocking the full potential of Eliza. By creating your own actions, you can tailor Eliza to specific use cases, whether it’s automating complex workflows, integrating with external services, or creating unique, engaging interactions. The flexibility and power provided by this system allow you to push the boundaries of what’s possible with autonomous AI agents.

    Reference URLs: