First-Ever General AI Agent, Manus. But it’s restricted by the invite code and money. However, we haven’t got the prices yet. But it’s not gonna be free. So, what we do now. Well, let’s move to our saviours, the open-source community
Well, guess what? OpenManus is like the answer to your prayers! It’s basically a free version of Manus that you can just download and use right now. It does all that cool AI agent stuff like figuring things out on its own, working with other programs, and automating tasks. And the best part? You don’t have to wait in line or pay anything, and you can see exactly how it’s built. Pretty awesome, huh?
OpenManus is an open-source project designed to allow users to create and utilize their own AI agents without requiring an invite code, unlike the proprietary Manus platform. It’s developed by a team including members from MetaGPT and aims to democratize access to AI agent creation.
Key Features
No Invite Code Required: Unlike Manus, OpenManus eliminates the need for an invite code, making it accessible to everyone.
Open-Source Implementation: The project is fully open-source, encouraging community contributions and improvements.
Integration with OpenManus-RL: Collaborates with researchers from UIUC on reinforcement learning tuning methods for LLM agents.
Active Development: The team is actively working on enhancements including improved planning capabilities, standardized evaluation metrics, model adaptation, containerized deployment, and expanded example libraries.
git clone https://github.com/mannaandpoem/OpenManus.git
cd OpenManus
Install dependencies:
pip install -r requirements.txt
Method 2: Using uv (Recommended)
Install uv:
curl -LsSf https://astral.sh/uv/install.sh | sh
Clone the repository:
git clone https://github.com/mannaandpoem/OpenManus.git
cd OpenManus
Create and activate a virtual environment:
uv venv
source .venv/bin/activate # On Unix/macOS
# Or on Windows:
# .venv\Scripts\activate
Install dependencies:
uv pip install -r requirements.txt
Configuration:
Create a config.toml file in the config directory by copying the example:cp config/config.example.toml config/config.toml
Edit config/config.toml to add your API keys and customize settings:
# Global LLM configuration
[llm]
model = "gpt-4o"
base_url = "https://api.openai.com/v1"
api_key = "sk-..." # Replace with your actual API key
max_tokens = 4096
temperature = 0.0
# Optional configuration for specific LLM models
[llm.vision]
model = "gpt-4o"
base_url = "https://api.openai.com/v1"
api_key = "sk-..." # Replace with your actual API key
Running OpenManus
After completing the installation and configuration steps, you can run OpenManus with a single command. The specific command may vary depending on your setup, but generally, you can execute:python main.py
Then input your idea via the terminal when prompted.
For the unstable version, you might need to use a different command as specified in the project documentation.
Building AI agents that interact with the web presents unique challenges. One of the most frustrating is the lack of persistent browser session for ai. Imagine an AI assistant that has to log in to a website every time it needs to perform a task. This repetitive process is not only time-consuming but also disrupts the flow of information and can lead to errors. Fortunately, there’s a solution: maintaining persistent browser sessions for your AI agents.
The Problem with Stateless AI Web Interactions
Without a persistent browser session, each interaction with a website is treated as a brand new visit. This means your AI agent loses all previous context, including login credentials, cookies, and browsing history. This “stateless” approach forces the agent to start from scratch each time, leading to:
Repetitive Logins: Constant login prompts hinder automation and slow down processes.
Loss of Context: Crucial information from previous interactions is lost, impacting the agent’s ability to perform complex tasks.
Inefficient Resource Use: Repeatedly loading websites and resources consumes unnecessary time and computing power.
Repetitive Logins: Constant login prompts hinder automation and slow down processes.
Loss of Context: Crucial information from previous interactions is lost, impacting the agent’s ability to perform complex tasks.
Inefficient Resource Use: Repeatedly loading websites and resources consumes unnecessary time and computing power.
The Power of Persistent Browser Sessions for AI
A persistent browser session for ai allows your agent to maintain a continuous connection with a website, preserving its state across multiple interactions. This means:
Eliminate Repetitive Logins: Your AI agent stays logged in, ready to perform tasks without interruption.
Preserve Context: Retain crucial information like cookies, browsing history, and form data for seamless task execution.
Streamline Workflow: Enable complex, multi-step automation without constantly restarting the process. This is crucial for tasks like web scraping, data extraction, and automated testing.
How Browser-Use Enables Persistent Sessions
Browser-Use offers a powerful solution for managing persistent browser context for ai. By leveraging its features, you can easily create and maintain browser sessions, allowing your AI agents to operate with maximum efficiency. This functionality is especially beneficial for long-running ai browser sessions that require continuous interaction with web applications.
Installation Guide
Prerequisites
Python 3.11 or higher
Git (for cloning the repository)
Option 1: Local Installation
Read the quickstart guide or follow the steps below to get started.
Step 1: Clone the Repository
git clone https://github.com/browser-use/web-ui.git
cd web-ui
Step 2: Set Up Python Environment
We recommend using uv for managing the Python environment.
Using uv (recommended):
uv venv --python 3.11
Activate the virtual environment:
Windows (Command Prompt):
.venv\Scripts\activate
Windows (PowerShell):
.\.venv\Scripts\Activate.ps1
macOS/Linux:
source .venv/bin/activate
Step 3: Install Dependencies
Install Python packages:
uv pip install -r requirements.txt
Install Playwright:
playwright install
Step 4: Configure Environment
Create a copy of the example environment file:
Windows (Command Prompt):
copy .env.example .env
macOS/Linux/Windows (PowerShell):
cp .env.example .env
Open .env in your preferred text editor and add your API keys and other settings
git clone https://github.com/browser-use/web-ui.git
cd web-ui
Create and configure environment file:
Windows (Command Prompt):
copy .env.example .env
macOS/Linux/Windows (PowerShell):
cp .env.example .env
Edit .env with your preferred text editor and add your API keys
Run with Docker:
# Build and start the container with default settings (browser closes after AI tasks)
docker compose up --build
# Or run with persistent browser (browser stays open between AI tasks)
CHROME_PERSISTENT_SESSION=true docker compose up --build
Access the Application:
Web Interface: Open http://localhost:7788 in your browser
VNC Viewer (for watching browser interactions): Open http://localhost:6080/vnc.html
Default VNC password: “youvncpassword”
Can be changed by setting VNC_PASSWORD in your .env file
Docker Setup
Environment Variables:
All configuration is done through the .env file
Available environment variables:
# LLM API Keys
OPENAI_API_KEY=your_key_here
ANTHROPIC_API_KEY=your_key_here
GOOGLE_API_KEY=your_key_here
# Browser Settings
CHROME_PERSISTENT_SESSION=true # Set to true to keep browser open between AI tasks
RESOLUTION=1920x1080x24 # Custom resolution format: WIDTHxHEIGHTxDEPTH
RESOLUTION_WIDTH=1920 # Custom width in pixels
RESOLUTION_HEIGHT=1080 # Custom height in pixels
# VNC Settings
VNC_PASSWORD=your_vnc_password # Optional, defaults to "vncpassword"
Platform Support:
Supports both AMD64 and ARM64 architectures
For ARM64 systems (e.g., Apple Silicon Macs), the container will automatically use the appropriate image
Browser Persistence Modes:
Default Mode (CHROME_PERSISTENT_SESSION=false):
Browser opens and closes with each AI task
Clean state for each interaction
Lower resource usage
Persistent Mode (CHROME_PERSISTENT_SESSION=true):
Browser stays open between AI tasks
Maintains history and state
Allows viewing previous AI interactions
Set in .env file or via environment variable when starting container
Viewing Browser Interactions:
Access the noVNC viewer at http://localhost:6080/vnc.html
Enter the VNC password (default: “vncpassword” or what you set in VNC_PASSWORD)
Direct VNC access available on port 5900 (mapped to container port 5901)
You can now see all browser interactions in real-time
Persistent browser sessions are essential for building efficient and robust AI agents that interact with the web. By eliminating repetitive logins, preserving context, and streamlining workflows, you can unlock the true potential of AI web automation. Explore Browser-Use and discover how its persistent session management can revolutionize your AI development process. Start building smarter, more efficient AI agents today!
Imagine having a 24/7 health companion that analyzes your medical history, tracks real-time vitals, and offers tailored advice—all while keeping your data private. This is the reality of AI health assistants, open-source tools merging artificial intelligence with healthcare to empower individuals and professionals alike. Let’s dive into how these systems work, their transformative benefits, and how you can build one using platforms like OpenHealthForAll
What Is an AI Health Assistant?
An AI health assistant is a digital tool that leverages machine learning, natural language processing (NLP), and data analytics to provide personalized health insights. For example:
OpenHealth consolidates blood tests, wearable data, and family history into structured formats, enabling GPT-powered conversations about your health.
Aiden, another assistant, uses WhatsApp to deliver habit-building prompts based on anonymized data from Apple Health or Fitbit.
These systems prioritize privacy, often running locally or using encryption to protect sensitive information.
Why AI Health Assistants Matter: 5 Key Benefits
Centralized Health Management Integrate wearables, lab reports, and EHRs into one platform. OpenHealth, for instance, parses blood tests and symptoms into actionable insights using LLMs like Claude or Gemini.
Real-Time Anomaly Detection Projects like Kavya Prabahar’s virtual assistant use RNNs to flag abnormal heart rates or predict fractures from X-rays.
Privacy-First Design Tools like Aiden anonymize data via Evervault and store records on blockchain (e.g., NearestDoctor’s smart contracts) to ensure compliance with regulations like HIPAA.
Empathetic Patient Interaction Assistants like OpenHealth use emotion-aware AI to provide compassionate guidance, reducing anxiety for users managing chronic conditions.
Cost-Effective Scalability Open-source frameworks like Google’s Open Health Stack (OHS) help developers build offline-capable solutions for low-resource regions, accelerating global healthcare access.
Challenges and Ethical Considerations
While promising, AI health assistants face hurdles:
Data Bias: Models trained on limited datasets may misdiagnose underrepresented groups.
Interoperability: Bridging EHR systems (e.g., HL7 FHIR) with AI requires standardization efforts like OHS.
Regulatory Compliance: Solutions must balance innovation with safety, as highlighted in Nature’s call for mandatory feedback loops in AI health tech.
Build Your Own AI Health Assistant: A Developer’s Guide
Step 1: Choose Your Stack
Data Parsing: Use OpenHealth’s Python-based parser (migrating to TypeScript soon) to structure inputs from wearables or lab reports.
AI Models: Integrate LLaMA or GPT-4 via APIs, or run Ollama locally for privacy.
Step 2: Prioritize Security
Encrypt user data with Supabase or Evervault.
Implement blockchain for audit trails, as seen in NearestDoctor’s medical records system.
Step 3: Start the setup
Clone the Repository:
git clone https://github.com/OpenHealthForAll/open-health.git
cd open-health
Setup and Run:
# Copy environment file
cp .env.example .env
# Add API keys to .env file:
# UPSTAGE_API_KEY - For parsing (You can get $10 credit without card registration by signing up at https://www.upstage.ai)
# OPENAI_API_KEY - For enhanced parsing capabilities
# Start the application using Docker Compose
docker compose --env-file .env up
For existing users, use:
docker compose --env-file .env up --build
Access OpenHealth: Open your browser and navigate to http://localhost:3000 to begin using OpenHealth.
The Future of AI Health Assistants
Decentralized AI Marketplaces: Platforms like Ocean Protocol could let users monetize health models securely.
AI-Powered Diagnostics: Google’s Health AI Developer Foundations aim to simplify building diagnostic tools for conditions like diabetes.
Global Accessibility: Initiatives like OHS workshops in Kenya and India are democratizing AI health tech.
Your Next Step
Contribute to OpenHealth’s GitHub repo to enhance its multilingual support.
DeepSeek R1 Distill: Complete Tutorial for Deployment & Fine-Tuning
Are you eager to explore the capabilities of the DeepSeek R1 Distill model? This guide provides a comprehensive, step-by-step approach to deploying the uncensored DeepSeek R1 Distill model to Google Cloud Run with GPU support, and also walks you through a practical fine-tuning process. The tutorial is broken down into the following sections:
Environment Setup
FastAPI Inference Server
Docker Configuration
Google Cloud Run Deployment
Fine-Tuning Pipeline
Let’s dive in and get started.
1. Environment Setup
Before deploying and fine-tuning, make sure you have the required tools installed and configured.
1.1 Install Required Tools
Python 3.9+
pip: For Python package installation
Docker: For containerization
Google Cloud CLI: For deployment
Install Google Cloud CLI (Ubuntu/Debian): Follow the official Google Cloud CLI installation guide to install gcloud.
1.2 Authenticate with Google Cloud
Run the following commands to initialize and authenticate with Google Cloud:
gcloud init
gcloud auth application-default login
Ensure you have an active Google Cloud project with Cloud Run, Compute Engine, and Container Registry/Artifact Registry enabled.
2. FastAPI Inference Server
We’ll create a minimal FastAPI application that serves two main endpoints:
/v1/inference: For model inference.
/v1/finetune: For uploading fine-tuning data (JSONL).
Create a file named main.py with the following content:
# main.py
from fastapi import FastAPI, File, UploadFile
from fastapi.responses import JSONResponse
from pydantic import BaseModel
import json
import litellm # Minimalistic LLM library
app = FastAPI()
class InferenceRequest(BaseModel):
prompt: str
max_tokens: int = 512
@app.post("/v1/inference")
async def inference(request: InferenceRequest):
"""
Inference endpoint using deepseek-r1-distill-7b (uncensored).
"""
response = litellm.completion(
model="deepseek/deepseek-r1-distill-7b",
messages=[{"role": "user", "content": request.prompt}],
max_tokens=request.max_tokens
)
return JSONResponse(content=response)
@app.post("/v1/finetune")
async def finetune(file: UploadFile = File(...)):
"""
Fine-tune endpoint that accepts a JSONL file.
"""
if not file.filename.endswith('.jsonl'):
return JSONResponse(
status_code=400,
content={"error": "Only .jsonl files are accepted for fine-tuning"}
)
# Read lines from uploaded file
data = [json.loads(line) for line in file.file]
# Perform or schedule a fine-tuning job here (simplified placeholder)
# You can integrate with your training pipeline below.
return JSONResponse(content={"status": "Fine-tuning request received", "samples": len(data)})
3. Docker Configuration
To containerize the application, create a requirements.txt file:
This command builds the Docker image, deploys it to Cloud Run with one nvidia-l4 GPU, allocates 16 GiB memory and 4 CPU cores, and exposes the service publicly (no authentication).
5. Fine-Tuning Pipeline
This section will guide you through a basic four-stage fine-tuning pipeline similar to DeepSeek R1’s training approach.
Eliza is a versatile multi-agent simulation framework, built in TypeScript, that allows you to create sophisticated, autonomous AI agents. These agents can interact across multiple platforms while maintaining consistent personalities and knowledge. A key feature that enables this flexibility is the ability to define custom actions and skills. This article will delve into how you can leverage this feature to make your Eliza agents even more powerful.
Understanding Actions in Eliza
Actions are the fundamental building blocks that dictate how Eliza agents respond to and interact with messages. They allow agents to go beyond simple text replies, enabling them to:
Interact with external systems.
Modify their behavior dynamically.
Perform complex tasks.
Each action in Eliza consists of several key components:
name: A unique identifier for the action.
similes: Alternative names or triggers that can invoke the action.
description: A detailed explanation of what the action does.
validate: A function that checks if the action is appropriate to execute in the current context.
handler: The implementation of the action’s behavior – the core logic that the action performs.
examples: Demonstrates proper usage patterns
suppressInitialMessage: When set to true, it prevents the initial message from being sent before processing the action.
Built-in Actions
Eliza includes several built-in actions to manage basic conversation flow and external integrations:
CONTINUE: Keeps a conversation going when more context is required.
IGNORE: Gracefully disengages from a conversation.
NONE: Default action for standard conversational replies.
TAKE_ORDER: Records and processes user purchase orders (primarily for Solana integration).
Here’s an example of how to structure your action file:
import { Action, IAgentRuntime, Memory } from "@elizaos/core";
export const myAction: Action = {
name: "MY_ACTION",
similes: ["SIMILAR_ACTION", "ALTERNATE_NAME"],
validate: async (runtime: IAgentRuntime, message: Memory) => {
// Validation logic here
return true;
},
description: "A detailed description of your action.",
handler: async (runtime: IAgentRuntime, message: Memory) => {
// The actual logic of your action
return true;
},
};
Implementing a Custom Action
Validation: Before executing an action, the validate function is called to determine if it can proceed, it checks if all the prerequisites are met to execute a specific action.
Handler: The handler function contains the core logic of the action. It interacts with the agent runtime and memory and also perform the desired tasks, such as calling external APIs, processing data, or generating output.
Examples of Custom Actions
Here are some examples to illustrate the possibilities:
Custom actions and skills are crucial for unlocking the full potential of Eliza. By creating your own actions, you can tailor Eliza to specific use cases, whether it’s automating complex workflows, integrating with external services, or creating unique, engaging interactions. The flexibility and power provided by this system allow you to push the boundaries of what’s possible with autonomous AI agents.
Artificial Intelligence is rapidly changing, and AI Agents by Google are at the forefront. These aren’t typical AI models. Instead, they are complex systems. They can reason, make logical decisions, and interact with the world using tools. This article explores what makes them special. Furthermore, it will examine how they are changing AI applications.
Understanding AI Agents
Essentially, AI Agents by Google are applications. The aim of AI Agents to achieve goals. They do this by observing their environment. They also use available tools. Unlike basic AI, agents are autonomous. They act independently. Moreover, they proactively make decisions. This helps them meet objectives, even without direct instructions. This is possible through their cognitive architecture, which includes three key parts:
The Model: This is the core language model. It is the central decision-maker. It uses reasoning frameworks like ReAct. Also, it uses Chain-of-Thought and Tree-of-Thoughts.
The Tools: These are crucial for external interaction. They allow the agent to connect to real-time data and services. For example, APIs can be used. They bridge the gap between internal knowledge and outside resources.
The Orchestration Layer: This layer manages the agent’s process. It determines how it takes in data. Then, it reasons internally. Finally, it informs the next action or decision in a continuous cycle.
AI Agents vs. Traditional AI Models
Traditional AI models have limitations. They are restricted by training data. They perform single inferences. In contrast, AI Agents by Google overcome these limits. They do this through several capabilities:
External System Access: They connect to external systems via tools. Thus, they interact with real-time data.
Session History Management: Agents track and manage session history. This enables multi-turn interactions with context.
Native Tool Implementation: They include built-in tools. This allows seamless execution of external tasks.
Cognitive Architectures: They utilize advanced frameworks. For instance, they use CoT and ReAct for reasoning.
The Role of Tools: Extensions, Functions, and Data Stores
AI Agents by Google interact with the outside world through three key tools:
Extensions
These tools bridge agents and APIs. They allow agents to use APIs to carry out actions through examples. For instance, they can use the Google Flights API. Extensions run on the agent-side. They are designed to make integrations scalable and strong.
Functions
Functions are self-contained code modules. Models use them for specific tasks. Unlike Extensions, these run on the client side. They don’t directly interact with APIs. This gives developers greater control over data flow and system execution.
Data Stores
Data Stores enable agents to access diverse data. This includes structured and unstructured data from various sources. For instance, they can access websites, PDFs, and databases. This dynamic interaction with current data enhances the model’s knowledge. Furthermore, it aids applications using Retrieval Augmented Generation (RAG).
Improving Agent Performance
To get the best results, AI Agents need targeted learning. These methods include:
In-context learning: Examples provided during inference let the model learn “on-the-fly.”
Retrieval-based in-context learning: External memory enhances this process. It provides more relevant examples.
Fine-tuning based learning: Pre-training the model is key. This improves its understanding of tools. Moreover, it improves its ability to know when to use them.
Getting Started with AI Agents
If you’re interested in building with AI Agents, consider using libraries like LangChain. Also, you might use platforms such as Google’s Vertex AI. LangChain helps users ‘chain’ sequences of logic and tool calls. Meanwhile, Vertex AI offers a managed environment. It supports building and deploying production-ready agents.
AI Agents by Google are transforming AI. They go beyond traditional limits. They can reason, use tools, and interact with the external world. Therefore, they are a major step forward. They create more flexible and capable AI systems. As these agents evolve, their ability to solve complex problems will also grow. In addition, their capacity to drive real-world value will expand.
Read More on the AI Agents by Google Whitepaper by Google.