Category: Agents

  • OpenManus: FULLY FREE Manus Alternative

    OpenManus: FULLY FREE Manus Alternative

    First-Ever General AI Agent, Manus. But it’s restricted by the invite code and money. However, we haven’t got the prices yet. But it’s not gonna be free. So, what we do now. Well, let’s move to our saviours, the open-source community

    Well, guess what? OpenManus is like the answer to your prayers! It’s basically a free version of Manus that you can just download and use right now. It does all that cool AI agent stuff like figuring things out on its own, working with other programs, and automating tasks. And the best part? You don’t have to wait in line or pay anything, and you can see exactly how it’s built. Pretty awesome, huh?

    OpenManus is an open-source project designed to allow users to create and utilize their own AI agents without requiring an invite code, unlike the proprietary Manus platform. It’s developed by a team including members from MetaGPT and aims to democratize access to AI agent creation.

    Key Features

    • No Invite Code Required: Unlike Manus, OpenManus eliminates the need for an invite code, making it accessible to everyone.
    • Open-Source Implementation: The project is fully open-source, encouraging community contributions and improvements.
    • Integration with OpenManus-RL: Collaborates with researchers from UIUC on reinforcement learning tuning methods for LLM agents.
    • Active Development: The team is actively working on enhancements including improved planning capabilities, standardized evaluation metrics, model adaptation, containerized deployment, and expanded example libraries.

    Technical Setup and Run Steps

    Installation

    Method 1: Using Conda

    Create and activate a new conda environment:

    conda create -n open_manus python=3.12
    conda activate open_manus

    Clone the repository:

    git clone https://github.com/mannaandpoem/OpenManus.git
    cd OpenManus

    Install dependencies:

    pip install -r requirements.txt

    Method 2: Using uv (Recommended)

    Install uv:

    curl -LsSf https://astral.sh/uv/install.sh | sh

    Clone the repository:

    git clone https://github.com/mannaandpoem/OpenManus.git
    cd OpenManus

    Create and activate a virtual environment:

    uv venv
    source .venv/bin/activate  # On Unix/macOS
    # Or on Windows:
    # .venv\Scripts\activate

    Install dependencies:

    uv pip install -r requirements.txt

    Configuration:

    Create a config.toml file in the config directory by copying the example:cp config/config.example.toml config/config.toml

    Edit config/config.toml to add your API keys and customize settings:

    # Global LLM configuration
    [llm]
    model = "gpt-4o"
    base_url = "https://api.openai.com/v1"
    api_key = "sk-..."  # Replace with your actual API key
    max_tokens = 4096
    temperature = 0.0
    
    # Optional configuration for specific LLM models
    [llm.vision]
    model = "gpt-4o"
    base_url = "https://api.openai.com/v1"
    api_key = "sk-..."  # Replace with your actual API key

    Running OpenManus

    After completing the installation and configuration steps, you can run OpenManus with a single command. The specific command may vary depending on your setup, but generally, you can execute:python main.py

    Then input your idea via the terminal when prompted.

    For the unstable version, you might need to use a different command as specified in the project documentation.

  • Never Start From Scratch: Persistent Browser Sessions for AI Agents

    Never Start From Scratch: Persistent Browser Sessions for AI Agents

    Building AI agents that interact with the web presents unique challenges. One of the most frustrating is the lack of persistent browser session for ai. Imagine an AI assistant that has to log in to a website every time it needs to perform a task. This repetitive process is not only time-consuming but also disrupts the flow of information and can lead to errors. Fortunately, there’s a solution: maintaining persistent browser sessions for your AI agents.

    The Problem with Stateless AI Web Interactions

    Without a persistent browser session, each interaction with a website is treated as a brand new visit. This means your AI agent loses all previous context, including login credentials, cookies, and browsing history. This “stateless” approach forces the agent to start from scratch each time, leading to:

    • Repetitive Logins: Constant login prompts hinder automation and slow down processes.
    • Loss of Context: Crucial information from previous interactions is lost, impacting the agent’s ability to perform complex tasks.
    • Inefficient Resource Use: Repeatedly loading websites and resources consumes unnecessary time and computing power.
    • Repetitive Logins: Constant login prompts hinder automation and slow down processes.
    • Loss of Context: Crucial information from previous interactions is lost, impacting the agent’s ability to perform complex tasks.
    • Inefficient Resource Use: Repeatedly loading websites and resources consumes unnecessary time and computing power.

    The Power of Persistent Browser Sessions for AI

    persistent browser session for ai allows your agent to maintain a continuous connection with a website, preserving its state across multiple interactions. This means:

    • Eliminate Repetitive Logins: Your AI agent stays logged in, ready to perform tasks without interruption.
    • Preserve Context: Retain crucial information like cookies, browsing history, and form data for seamless task execution.
    • Streamline Workflow: Enable complex, multi-step automation without constantly restarting the process. This is crucial for tasks like web scraping, data extraction, and automated testing.

    How Browser-Use Enables Persistent Sessions

    Browser-Use offers a powerful solution for managing persistent browser context for ai. By leveraging its features, you can easily create and maintain browser sessions, allowing your AI agents to operate with maximum efficiency. This functionality is especially beneficial for long-running ai browser sessions that require continuous interaction with web applications.

    Installation Guide

    Prerequisites

    • Python 3.11 or higher
    • Git (for cloning the repository)

    Option 1: Local Installation

    Read the quickstart guide or follow the steps below to get started.

    Step 1: Clone the Repository

    git clone https://github.com/browser-use/web-ui.git
    cd web-ui

    Step 2: Set Up Python Environment

    We recommend using uv for managing the Python environment.

    Using uv (recommended):

    uv venv --python 3.11

    Activate the virtual environment:

    • Windows (Command Prompt):
    .venv\Scripts\activate
    • Windows (PowerShell):
    .\.venv\Scripts\Activate.ps1
    • macOS/Linux:
    source .venv/bin/activate

    Step 3: Install Dependencies

    Install Python packages:

    uv pip install -r requirements.txt

    Install Playwright:

    playwright install

    Step 4: Configure Environment

    1. Create a copy of the example environment file:
    • Windows (Command Prompt):
    copy .env.example .env
    • macOS/Linux/Windows (PowerShell):
    cp .env.example .env
    1. Open .env in your preferred text editor and add your API keys and other settings

    Option 2: Docker Installation

    Prerequisites

    Installation Steps

    1. Clone the repository:
    git clone https://github.com/browser-use/web-ui.git
    cd web-ui
    1. Create and configure environment file:
    • Windows (Command Prompt):
    copy .env.example .env
    • macOS/Linux/Windows (PowerShell):
    cp .env.example .env

    Edit .env with your preferred text editor and add your API keys

    1. Run with Docker:
    # Build and start the container with default settings (browser closes after AI tasks)
    docker compose up --build
    # Or run with persistent browser (browser stays open between AI tasks)
    CHROME_PERSISTENT_SESSION=true docker compose up --build
    1. Access the Application:
    • Web Interface: Open http://localhost:7788 in your browser
    • VNC Viewer (for watching browser interactions): Open http://localhost:6080/vnc.html
      • Default VNC password: “youvncpassword”
      • Can be changed by setting VNC_PASSWORD in your .env file

    Docker Setup

    Environment Variables:

    All configuration is done through the .env file

    Available environment variables:

    # LLM API Keys
    OPENAI_API_KEY=your_key_here
    ANTHROPIC_API_KEY=your_key_here
    GOOGLE_API_KEY=your_key_here
    
    # Browser Settings
    CHROME_PERSISTENT_SESSION=true   # Set to true to keep browser open between AI tasks
    RESOLUTION=1920x1080x24         # Custom resolution format: WIDTHxHEIGHTxDEPTH
    RESOLUTION_WIDTH=1920           # Custom width in pixels
    RESOLUTION_HEIGHT=1080          # Custom height in pixels
    
    # VNC Settings
    VNC_PASSWORD=your_vnc_password  # Optional, defaults to "vncpassword"

    Platform Support:

    Supports both AMD64 and ARM64 architectures

    For ARM64 systems (e.g., Apple Silicon Macs), the container will automatically use the appropriate image

    Browser Persistence Modes:

    Default Mode (CHROME_PERSISTENT_SESSION=false):

    Browser opens and closes with each AI task

    Clean state for each interaction

    Lower resource usage

    Persistent Mode (CHROME_PERSISTENT_SESSION=true):

    Browser stays open between AI tasks

    Maintains history and state

    Allows viewing previous AI interactions

    Set in .env file or via environment variable when starting container

    Viewing Browser Interactions:

    Access the noVNC viewer at http://localhost:6080/vnc.html

    Enter the VNC password (default: “vncpassword” or what you set in VNC_PASSWORD)

    Direct VNC access available on port 5900 (mapped to container port 5901)

    You can now see all browser interactions in real-time

    Persistent browser sessions are essential for building efficient and robust AI agents that interact with the web. By eliminating repetitive logins, preserving context, and streamlining workflows, you can unlock the true potential of AI web automation. Explore Browser-Use and discover how its persistent session management can revolutionize your AI development process. Start building smarter, more efficient AI agents today!

  • Build Your Own and Free AI Health Assistant, Personalized Healthcare

    Build Your Own and Free AI Health Assistant, Personalized Healthcare

    Imagine having a 24/7 health companion that analyzes your medical history, tracks real-time vitals, and offers tailored advice—all while keeping your data private. This is the reality of AI health assistants, open-source tools merging artificial intelligence with healthcare to empower individuals and professionals alike. Let’s dive into how these systems work, their transformative benefits, and how you can build one using platforms like OpenHealthForAll 

    What Is an AI Health Assistant?

    An AI health assistant is a digital tool that leverages machine learning, natural language processing (NLP), and data analytics to provide personalized health insights. For example:

    • OpenHealth consolidates blood tests, wearable data, and family history into structured formats, enabling GPT-powered conversations about your health.
    • Aiden, another assistant, uses WhatsApp to deliver habit-building prompts based on anonymized data from Apple Health or Fitbit.

    These systems prioritize privacy, often running locally or using encryption to protect sensitive information.


    Why AI Health Assistants Matter: 5 Key Benefits

    1. Centralized Health Management
      Integrate wearables, lab reports, and EHRs into one platform. OpenHealth, for instance, parses blood tests and symptoms into actionable insights using LLMs like Claude or Gemini.
    2. Real-Time Anomaly Detection
      Projects like Kavya Prabahar’s virtual assistant use RNNs to flag abnormal heart rates or predict fractures from X-rays.
    3. Privacy-First Design
      Tools like Aiden anonymize data via Evervault and store records on blockchain (e.g., NearestDoctor’s smart contracts) to ensure compliance with regulations like HIPAA.
    4. Empathetic Patient Interaction
      Assistants like OpenHealth use emotion-aware AI to provide compassionate guidance, reducing anxiety for users managing chronic conditions.
    5. Cost-Effective Scalability
      Open-source frameworks like Google’s Open Health Stack (OHS) help developers build offline-capable solutions for low-resource regions, accelerating global healthcare access.

    Challenges and Ethical Considerations

    While promising, AI health assistants face hurdles:

    • Data Bias: Models trained on limited datasets may misdiagnose underrepresented groups.
    • Interoperability: Bridging EHR systems (e.g., HL7 FHIR) with AI requires standardization efforts like OHS.
    • Regulatory Compliance: Solutions must balance innovation with safety, as highlighted in Nature’s call for mandatory feedback loops in AI health tech.

    Build Your Own AI Health Assistant: A Developer’s Guide

    Step 1: Choose Your Stack

    • Data Parsing: Use OpenHealth’s Python-based parser (migrating to TypeScript soon) to structure inputs from wearables or lab reports.
    • AI Models: Integrate LLaMA or GPT-4 via APIs, or run Ollama locally for privacy.

    Step 2: Prioritize Security

    • Encrypt user data with Supabase or Evervault.
    • Implement blockchain for audit trails, as seen in NearestDoctor’s medical records system.

    Step 3: Start the setup

    Clone the Repository:

    git clone https://github.com/OpenHealthForAll/open-health.git
    cd open-health

    Setup and Run:

    # Copy environment file
    cp .env.example .env
    
    # Add API keys to .env file:
    # UPSTAGE_API_KEY - For parsing (You can get $10 credit without card registration by signing up at https://www.upstage.ai)
    # OPENAI_API_KEY - For enhanced parsing capabilities
    
    # Start the application using Docker Compose
    docker compose --env-file .env up

    For existing users, use:

    docker compose --env-file .env up --build
    1. Access OpenHealth: Open your browser and navigate to http://localhost:3000 to begin using OpenHealth.

    The Future of AI Health Assistants

    1. Decentralized AI Marketplaces: Platforms like Ocean Protocol could let users monetize health models securely.
    2. AI-Powered Diagnostics: Google’s Health AI Developer Foundations aim to simplify building diagnostic tools for conditions like diabetes.
    3. Global Accessibility: Initiatives like OHS workshops in Kenya and India are democratizing AI health tech.

    Your Next Step

    • Contribute to OpenHealth’s GitHub repo to enhance its multilingual support.
  • Deploy an uncensored DeepSeek R1 model on Google Cloud Run

    Deploy an uncensored DeepSeek R1 model on Google Cloud Run

    DeepSeek R1 Distill: Complete Tutorial for Deployment & Fine-Tuning

    Are you eager to explore the capabilities of the DeepSeek R1 Distill model? This guide provides a comprehensive, step-by-step approach to deploying the uncensored DeepSeek R1 Distill model to Google Cloud Run with GPU support, and also walks you through a practical fine-tuning process. The tutorial is broken down into the following sections:

    Deploy uncensored DeepSeek model
    • Environment Setup
    • FastAPI Inference Server
    • Docker Configuration
    • Google Cloud Run Deployment
    • Fine-Tuning Pipeline

    Let’s dive in and get started.

    1. Environment Setup

    Before deploying and fine-tuning, make sure you have the required tools installed and configured.

    1.1 Install Required Tools

    • Python 3.9+
    • pip: For Python package installation
    • Docker: For containerization
    • Google Cloud CLI: For deployment

    Install Google Cloud CLI (Ubuntu/Debian):
    Follow the official Google Cloud CLI installation guide to install gcloud.

    1.2 Authenticate with Google Cloud

    Run the following commands to initialize and authenticate with Google Cloud:

    gcloud init
    gcloud auth application-default login

    Ensure you have an active Google Cloud project with Cloud Run, Compute Engine, and Container Registry/Artifact Registry enabled.

    2. FastAPI Inference Server

    We’ll create a minimal FastAPI application that serves two main endpoints:

    • /v1/inference: For model inference.
    • /v1/finetune: For uploading fine-tuning data (JSONL).

    Create a file named main.py with the following content:

    # main.py
    from fastapi import FastAPI, File, UploadFile
    from fastapi.responses import JSONResponse
    from pydantic import BaseModel
    import json
    
    import litellm  # Minimalistic LLM library
    
    app = FastAPI()
    
    class InferenceRequest(BaseModel):
        prompt: str
        max_tokens: int = 512
    
    @app.post("/v1/inference")
    async def inference(request: InferenceRequest):
        """
        Inference endpoint using deepseek-r1-distill-7b (uncensored).
        """
        response = litellm.completion(
            model="deepseek/deepseek-r1-distill-7b",
            messages=[{"role": "user", "content": request.prompt}],
            max_tokens=request.max_tokens
        )
        return JSONResponse(content=response)
    
    @app.post("/v1/finetune")
    async def finetune(file: UploadFile = File(...)):
        """
        Fine-tune endpoint that accepts a JSONL file.
        """
        if not file.filename.endswith('.jsonl'):
            return JSONResponse(
                status_code=400,
                content={"error": "Only .jsonl files are accepted for fine-tuning"}
            )
    
        # Read lines from uploaded file
        data = [json.loads(line) for line in file.file]
    
        # Perform or schedule a fine-tuning job here (simplified placeholder)
        # You can integrate with your training pipeline below.
        
        return JSONResponse(content={"status": "Fine-tuning request received", "samples": len(data)})

    3. Docker Configuration

    To containerize the application, create a requirements.txt file:

    fastapi
    uvicorn
    litellm
    pydantic
    transformers
    datasets
    accelerate
    trl
    torch

    And create a Dockerfile:

    # Dockerfile
    FROM nvidia/cuda:12.0.0-base-ubuntu22.04
    
    # Install basic dependencies
    RUN apt-get update && apt-get install -y python3 python3-pip
    
    # Create app directory
    WORKDIR /app
    
    # Copy requirements and install
    COPY requirements.txt .
    RUN pip3 install --upgrade pip
    RUN pip3 install --no-cache-dir -r requirements.txt
    
    # Copy code
    COPY . .
    
    # Expose port 8080 for Cloud Run
    EXPOSE 8080
    
    # Start server
    CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

    4. Deploy to Google Cloud Run with GPU

    4.1 Enable GPU on Cloud Run

    Make sure your Google Cloud project has a GPU quota available, such as nvidia-l4.

    4.2 Build and Deploy

    Run this command from your project directory to deploy the application to Cloud Run:

    gcloud run deploy deepseek-uncensored \
        --source . \
        --region us-central1 \
        --platform managed \
        --gpu 1 \
        --gpu-type nvidia-l4 \
        --memory 16Gi \
        --cpu 4 \
        --allow-unauthenticated

    This command builds the Docker image, deploys it to Cloud Run with one nvidia-l4 GPU, allocates 16 GiB memory and 4 CPU cores, and exposes the service publicly (no authentication).

    5. Fine-Tuning Pipeline

    This section will guide you through a basic four-stage fine-tuning pipeline similar to DeepSeek R1’s training approach.

    5.1 Directory Structure Example

    .
    ├── main.py
    ├── finetune_pipeline.py
    ├── cold_start_data.jsonl
    ├── reasoning_data.jsonl
    ├── data_collection.jsonl
    ├── final_data.jsonl
    ├── requirements.txt
    └── Dockerfile

    Replace the .jsonl files with your actual training data.

    5.2 Fine-Tuning Code: finetune_pipeline.py

    Create a finetune_pipeline.py file with the following code:

    # finetune_pipeline.py
    
    import os
    import torch
    from transformers import (AutoModelForCausalLM, AutoTokenizer,
                              Trainer, TrainingArguments)
    from datasets import load_dataset
    
    from trl import PPOTrainer, PPOConfig, AutoModelForCausalLMWithValueHead
    from transformers import pipeline, AutoModel
    
    
    # 1. Cold Start Phase
    def cold_start_finetune(
        base_model="deepseek-ai/deepseek-r1-distill-7b",
        train_file="cold_start_data.jsonl",
        output_dir="cold_start_finetuned_model"
    ):
        # Load model and tokenizer
        model = AutoModelForCausalLM.from_pretrained(base_model)
        tokenizer = AutoTokenizer.from_pretrained(base_model)
    
        # Load dataset
        dataset = load_dataset("json", data_files=train_file, split="train")
    
        # Simple tokenization function
        def tokenize_function(example):
            return tokenizer(
                example["prompt"] + "\n" + example["completion"],
                truncation=True,
                max_length=512
            )
    
        dataset = dataset.map(tokenize_function, batched=True)
        dataset = dataset.shuffle()
    
        # Define training arguments
        training_args = TrainingArguments(
            output_dir=output_dir,
            num_train_epochs=1,
            per_device_train_batch_size=2,
            gradient_accumulation_steps=4,
            save_steps=50,
            logging_steps=50,
            learning_rate=5e-5
        )
    
        # Trainer
        trainer = Trainer(
            model=model,
            args=training_args,
            train_dataset=dataset
        )
    
        trainer.train()
        trainer.save_model(output_dir)
        tokenizer.save_pretrained(output_dir)
        return output_dir
    
    
    # 2. Reasoning RL Training
    def reasoning_rl_training(
        cold_start_model_dir="cold_start_finetuned_model",
        train_file="reasoning_data.jsonl",
        output_dir="reasoning_rl_model"
    ):
        # Config for PPO
        config = PPOConfig(
            batch_size=16,
            learning_rate=1e-5,
            log_with=None,  # or 'wandb'
            mini_batch_size=4
        )
    
        # Load model and tokenizer
        model = AutoModelForCausalLMWithValueHead.from_pretrained(cold_start_model_dir)
        tokenizer = AutoTokenizer.from_pretrained(cold_start_model_dir)
    
        # Create a PPO trainer
        ppo_trainer = PPOTrainer(
            config,
            model,
            tokenizer=tokenizer,
        )
    
        # Load dataset
        dataset = load_dataset("json", data_files=train_file, split="train")
    
        # Simple RL loop (pseudo-coded for brevity)
        for sample in dataset:
            prompt = sample["prompt"]
            desired_answer = sample["completion"]  # For reward calculation
    
            # Generate response
            query_tensors = tokenizer.encode(prompt, return_tensors="pt")
            response_tensors = ppo_trainer.generate(query_tensors, max_new_tokens=50)
            response_text = tokenizer.decode(response_tensors[0], skip_special_tokens=True)
    
            # Calculate reward (simplistic: measure overlap or correctness)
            reward = 1.0 if desired_answer in response_text else -1.0
    
            # Run a PPO step
            ppo_trainer.step([query_tensors[0]], [response_tensors[0]], [reward])
    
        model.save_pretrained(output_dir)
        tokenizer.save_pretrained(output_dir)
        return output_dir
    
    
    # 3. Data Collection
    def collect_data(
        rl_model_dir="reasoning_rl_model",
        num_samples=1000,
        output_file="data_collection.jsonl"
    ):
        """
        Example data collection: generate completions from the RL model.
        This is a simple version that just uses random prompts or a given file of prompts.
        """
        tokenizer = AutoTokenizer.from_pretrained(rl_model_dir)
        model = AutoModelForCausalLM.from_pretrained(rl_model_dir)
    
        # Suppose we have some random prompts:
        prompts = [
            "Explain quantum entanglement",
            "Summarize the plot of 1984 by George Orwell",
            # ... add or load from a prompt file ...
        ]
    
        collected = []
        for i in range(num_samples):
            prompt = prompts[i % len(prompts)]
            inputs = tokenizer(prompt, return_tensors="pt")
            outputs = model.generate(**inputs, max_new_tokens=50)
            completion = tokenizer.decode(outputs[0], skip_special_tokens=True)
            collected.append({"prompt": prompt, "completion": completion})
    
        # Save to JSONL
        with open(output_file, "w") as f:
            for item in collected:
                f.write(f"{item}\n")
    
        return output_file
    
    
    # 4. Final RL Phase
    def final_rl_phase(
        rl_model_dir="reasoning_rl_model",
        final_data="final_data.jsonl",
        output_dir="final_rl_model"
    ):
        """
        Another RL phase using a new dataset or adding human feedback.
        This is a simplified approach similar to the reasoning RL training step.
        """
        config = PPOConfig(
            batch_size=16,
            learning_rate=1e-5,
            log_with=None,
            mini_batch_size=4
        )
    
        model = AutoModelForCausalLMWithValueHead.from_pretrained(rl_model_dir)
        tokenizer = AutoTokenizer.from_pretrained(rl_model_dir)
        ppo_trainer = PPOTrainer(config, model, tokenizer=tokenizer)
    
        dataset = load_dataset("json", data_files=final_data, split="train")
    
        for sample in dataset:
            prompt = sample["prompt"]
            desired_answer = sample["completion"]
            query_tensors = tokenizer.encode(prompt, return_tensors="pt")
            response_tensors = ppo_trainer.generate(query_tensors, max_new_tokens=50)
            response_text = tokenizer.decode(response_tensors[0], skip_special_tokens=True)
    
            reward = 1.0 if desired_answer in response_text else 0.0
            ppo_trainer.step([query_tensors[0]], [response_tensors[0]], [reward])
    
        model.save_pretrained(output_dir)
        tokenizer.save_pretrained(output_dir)
        return output_dir
    
    
    # END-TO-END PIPELINE EXAMPLE
    if __name__ == "__main__":
        # 1) Cold Start
        cold_start_out = cold_start_finetune(
            base_model="deepseek-ai/deepseek-r1-distill-7b",
            train_file="cold_start_data.jsonl",
            output_dir="cold_start_finetuned_model"
        )
    
        # 2) Reasoning RL
        reasoning_rl_out = reasoning_rl_training(
            cold_start_model_dir=cold_start_out,
            train_file="reasoning_data.jsonl",
            output_dir="reasoning_rl_model"
        )
    
        # 3) Data Collection
        data_collection_out = collect_data(
            rl_model_dir=reasoning_rl_out,
            num_samples=100,
            output_file="data_collection.jsonl"
        )
    
        # 4) Final RL Phase
        final_rl_out = final_rl_phase(
            rl_model_dir=reasoning_rl_out,
            final_data="final_data.jsonl",
            output_dir="final_rl_model"
        )
    
        print("All done! Final model stored in:", final_rl_out)

    Usage Overview

    1. Upload Your Data:
      • Prepare cold_start_data.jsonl, reasoning_data.jsonl, final_data.jsonl, etc.
      • Each line should be a JSON object with “prompt” and “completion” keys.
    2. Run the Pipeline Locally:
    python3 finetune_pipeline.py

    This creates directories like cold_start_finetuned_model, reasoning_rl_model, and final_rl_model.

    1. Deploy:
      • Build and push via gcloud run deploy.
    2. Inference:
      • After deployment, send a POST request to your Cloud Run service:
    import requests
    
    url = "https://<YOUR-CLOUD-RUN-URL>/v1/inference"
    data = {"prompt": "Tell me about quantum physics", "max_tokens": 100}
    response = requests.post(url, json=data)
    print(response.json())

    Fine-Tuning via Endpoint:

    • Upload new data for fine-tuning:
    import requests
    
    url = "https://<YOUR-CLOUD-RUN-URL>/v1/finetune"
    with open("new_training_data.jsonl", "rb") as f:
        r = requests.post(url, files={"file": ("new_training_data.jsonl", f)})
    print(r.json())

    This tutorial has provided an end-to-end pipeline for deploying and fine-tuning the DeepSeek R1 Distill model. You’ve learned how to:

    • Deploy a FastAPI server with Docker and GPU support on Google Cloud Run.
    • Fine-tune the model in four stages: Cold Start, Reasoning RL, Data Collection, and Final RL.
    • Use TRL (PPO) for basic RL-based training loops.

    Disclaimer: Deploying uncensored models has ethical and legal implications. Make sure to comply with relevant laws, policies, and usage guidelines.

    This comprehensive guide should equip you with the knowledge to start deploying and fine-tuning the DeepSeek R1 Distill model.

  • How to add custom actions and skills in Eliza AI?

    How to add custom actions and skills in Eliza AI?

    Eliza is a versatile multi-agent simulation framework, built in TypeScript, that allows you to create sophisticated, autonomous AI agents. These agents can interact across multiple platforms while maintaining consistent personalities and knowledge. A key feature that enables this flexibility is the ability to define custom actions and skills. This article will delve into how you can leverage this feature to make your Eliza agents even more powerful.

    Understanding Actions in Eliza

    Actions are the fundamental building blocks that dictate how Eliza agents respond to and interact with messages. They allow agents to go beyond simple text replies, enabling them to:

    add actions and skills in Eliza
    • Interact with external systems.
    • Modify their behavior dynamically.
    • Perform complex tasks.

    Each action in Eliza consists of several key components:

    • name: A unique identifier for the action.
    • similes: Alternative names or triggers that can invoke the action.
    • description: A detailed explanation of what the action does.
    • validate: A function that checks if the action is appropriate to execute in the current context.
    • handler: The implementation of the action’s behavior – the core logic that the action performs.
    • examples: Demonstrates proper usage patterns
    • suppressInitialMessage: When set to true, it prevents the initial message from being sent before processing the action.

    Built-in Actions

    Eliza includes several built-in actions to manage basic conversation flow and external integrations:

    • CONTINUE: Keeps a conversation going when more context is required.
    • IGNORE: Gracefully disengages from a conversation.
    • NONE: Default action for standard conversational replies.
    • TAKE_ORDER: Records and processes user purchase orders (primarily for Solana integration).

    Creating Custom Actions: Expanding Eliza’s Capabilities

    The power of Eliza truly shines when you start implementing custom actions and skills. Here’s how to create them:

    1. Create a custom_actions directory: This is where you’ll store your action files.
    2. Add your action files: Each action is defined in its own TypeScript file, implementing the Action interface.
    3. Configure in elizaConfig.yaml: Point to your custom actions by adding entries under the actions key.
    actions:
        - name: myCustomAction
          path: ./custom_actions/myAction.ts

    Action Configuration Structure

    Here’s an example of how to structure your action file:

    import { Action, IAgentRuntime, Memory } from "@elizaos/core";
    
    export const myAction: Action = {
        name: "MY_ACTION",
        similes: ["SIMILAR_ACTION", "ALTERNATE_NAME"],
        validate: async (runtime: IAgentRuntime, message: Memory) => {
            // Validation logic here
            return true;
        },
        description: "A detailed description of your action.",
        handler: async (runtime: IAgentRuntime, message: Memory) => {
            // The actual logic of your action
            return true;
        },
    };

    Implementing a Custom Action

    • Validation: Before executing an action, the validate function is called to determine if it can proceed, it checks if all the prerequisites are met to execute a specific action.
    • Handler: The handler function contains the core logic of the action. It interacts with the agent runtime and memory and also perform the desired tasks, such as calling external APIs, processing data, or generating output.

    Examples of Custom Actions

    Here are some examples to illustrate the possibilities:

    Basic Action Template:

    const customAction: Action = {
        name: "CUSTOM_ACTION",
        similes: ["SIMILAR_ACTION"],
        description: "Action purpose",
        validate: async (runtime: IAgentRuntime, message: Memory) => {
            // Validation logic
            return true;
        },
        handler: async (runtime: IAgentRuntime, message: Memory) => {
            // Implementation
        },
        examples: [],
    };

    Advanced Action Example: Processing Documents:

    const complexAction: Action = {
        name: "PROCESS_DOCUMENT",
        similes: ["READ_DOCUMENT", "ANALYZE_DOCUMENT"],
        description: "Process and analyze uploaded documents",
        validate: async (runtime, message) => {
            const hasAttachment = message.content.attachments?.length > 0;
            const supportedTypes = ["pdf", "txt", "doc"];
            return (
                hasAttachment &&
                supportedTypes.includes(message.content.attachments[0].type)
            );
        },
        handler: async (runtime, message, state) => {
            const attachment = message.content.attachments[0];
    
            // Process document
            const content = await runtime
                .getService<IDocumentService>(ServiceType.DOCUMENT)
                .processDocument(attachment);
    
            // Store in memory
            await runtime.documentsManager.createMemory({
                id: generateId(),
                content: { text: content },
                userId: message.userId,
                roomId: message.roomId,
            });
    
            return true;
        },
    };

    Best Practices for Custom Actions

    • Single Responsibility: Ensure each action has a single, well-defined purpose.
    • Robust Validation: Always validate inputs and preconditions before executing an action.
    • Clear Error Handling: Implement error catching and provide informative error messages.
    • Detailed Examples: Include examples in the examples field to show the action’s usage.

    Testing Your Actions

    Eliza provides a built-in testing framework to validate your actions:

    test("Validate action behavior", async () => {
        const message: Memory = {
            userId: user.id,
            content: { text: "Test message" },
            roomId,
        };
    
        const response = await handleMessage(runtime, message);
        // Verify response
    });

    Custom actions and skills are crucial for unlocking the full potential of Eliza. By creating your own actions, you can tailor Eliza to specific use cases, whether it’s automating complex workflows, integrating with external services, or creating unique, engaging interactions. The flexibility and power provided by this system allow you to push the boundaries of what’s possible with autonomous AI agents.

    Reference URLs:

  • AI Agents by Google: Revolutionizing AI with Reasoning and Tools

    AI Agents by Google: Revolutionizing AI with Reasoning and Tools

    Artificial Intelligence is rapidly changing, and AI Agents by Google are at the forefront. These aren’t typical AI models. Instead, they are complex systems. They can reason, make logical decisions, and interact with the world using tools. This article explores what makes them special. Furthermore, it will examine how they are changing AI applications.

    Understanding AI Agents

    AI Agents by Google

    Essentially, AI Agents by Google are applications. The aim of AI Agents to achieve goals. They do this by observing their environment. They also use available tools. Unlike basic AI, agents are autonomous. They act independently. Moreover, they proactively make decisions. This helps them meet objectives, even without direct instructions. This is possible through their cognitive architecture, which includes three key parts:

    • The Model: This is the core language model. It is the central decision-maker. It uses reasoning frameworks like ReAct. Also, it uses Chain-of-Thought and Tree-of-Thoughts.
    • The Tools: These are crucial for external interaction. They allow the agent to connect to real-time data and services. For example, APIs can be used. They bridge the gap between internal knowledge and outside resources.
    • The Orchestration Layer: This layer manages the agent’s process. It determines how it takes in data. Then, it reasons internally. Finally, it informs the next action or decision in a continuous cycle.

    AI Agents vs. Traditional AI Models

    Traditional AI models have limitations. They are restricted by training data. They perform single inferences. In contrast, AI Agents by Google overcome these limits. They do this through several capabilities:

    • External System Access: They connect to external systems via tools. Thus, they interact with real-time data.
    • Session History Management: Agents track and manage session history. This enables multi-turn interactions with context.
    • Native Tool Implementation: They include built-in tools. This allows seamless execution of external tasks.
    • Cognitive Architectures: They utilize advanced frameworks. For instance, they use CoT and ReAct for reasoning.

    The Role of Tools: Extensions, Functions, and Data Stores

    AI Agents by Google interact with the outside world through three key tools:

    Extensions

    These tools bridge agents and APIs. They allow agents to use APIs to carry out actions through examples. For instance, they can use the Google Flights API. Extensions run on the agent-side. They are designed to make integrations scalable and strong.

    Functions

    Functions are self-contained code modules. Models use them for specific tasks. Unlike Extensions, these run on the client side. They don’t directly interact with APIs. This gives developers greater control over data flow and system execution.

    Data Stores

    Data Stores enable agents to access diverse data. This includes structured and unstructured data from various sources. For instance, they can access websites, PDFs, and databases. This dynamic interaction with current data enhances the model’s knowledge. Furthermore, it aids applications using Retrieval Augmented Generation (RAG).

    Improving Agent Performance

    To get the best results, AI Agents need targeted learning. These methods include:

    • In-context learning: Examples provided during inference let the model learn “on-the-fly.”
    • Retrieval-based in-context learning: External memory enhances this process. It provides more relevant examples.
    • Fine-tuning based learning: Pre-training the model is key. This improves its understanding of tools. Moreover, it improves its ability to know when to use them.

    Getting Started with AI Agents

    If you’re interested in building with AI Agents, consider using libraries like LangChain. Also, you might use platforms such as Google’s Vertex AI. LangChain helps users ‘chain’ sequences of logic and tool calls. Meanwhile, Vertex AI offers a managed environment. It supports building and deploying production-ready agents.

    AI Agents by Google are transforming AI. They go beyond traditional limits. They can reason, use tools, and interact with the external world. Therefore, they are a major step forward. They create more flexible and capable AI systems. As these agents evolve, their ability to solve complex problems will also grow. In addition, their capacity to drive real-world value will expand.

    Read More on the AI Agents by Google Whitepaper by Google.