LangChain SDK

Use MrScraper as LangChain tools via the langchain-mrscraper package—fetch HTML, run AI and manual scrapers, and list results from agents.

The langchain-mrscraper package exposes the MrScraper API as LangChain BaseTool instances, enabling AI agents to fetch rendered HTML, create and rerun scrapers, and retrieve results. The SDK client (mrscraper-sdk) is installed as a dependency.

Tools-First Integration

This integration provides tools that agents can explicitly call—not a document loader. For deterministic "URL → documents" ingestion into vector stores, a document loader is typically a better fit. MrScraper may offer that in a separate package later.

Requirements

Python 3.9 or higher
LangChain ecosystem packages (installed as dependencies)

Package Information

See langchain-mrscraper on PyPI for the latest release, version history, and installation details.

Installation

Install the package from PyPI using pip:

pip install -U langchain-mrscraper

This installs both the LangChain integration and the underlying MrScraper SDK client.

Authentication

Set your API token from the MrScraper dashboard as an environment variable:

import os

os.environ["MRSCRAPER_API_TOKEN"] = "MRSCRAPER_API_TOKEN"

Authentication Options

You can authenticate in three ways:

Environment variable (recommended): MRSCRAPER_API_KEY or MRSCRAPER_API_TOKEN
Toolkit parameter: MrScraperToolkit(token="...")
Function parameter: load_mrscraper_tools(mrscraper_api_key="...")

For more information on generating API tokens, see the Generate Token guide.

Quick Start

Using MrScraperToolkit

Load all available tools with the toolkit:

from langchain_mrscraper import MrScraperToolkit

# Using environment variable
tools = MrScraperToolkit().get_tools()

# Or with explicit token
tools = MrScraperToolkit(token="MRSCRAPER_API_TOKEN").get_tools()

Using Convenience Loader

Alternatively, use the convenience function:

from langchain_mrscraper import load_mrscraper_tools

tools = load_mrscraper_tools()

Using with AI Agents

from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
from langchain_mrscraper import MrScraperToolkit

# Load MrScraper tools
tools = MrScraperToolkit(token="MRSCRAPER_API_TOKEN").get_tools()

# Create agent with OpenAI
agent = create_react_agent(ChatOpenAI(model="gpt-4o-mini"), tools)

# Use the agent
response = agent.invoke({
    "messages": [{"role": "user", "content": "Fetch the HTML from https://example.com"}]
})

Prerequisites

This example requires langgraph and langchain-openai packages:

pip install langgraph langchain-openai

Core Methods

The examples below demonstrate each tool individually. Replace placeholders like scraper and result IDs with values from your dashboard or prior API responses.

Fetch Raw HTML

Fetch fully rendered HTML after JavaScript execution using the stealth browser.

from langchain_mrscraper import load_mrscraper_tools

fetch_html, = load_mrscraper_tools(
    token="MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_fetch_html"],
)

output = fetch_html.invoke(
    {
        "url": "https://example.com/page",
        "timeout": 120,
        "geo_code": "US",
        "block_resources": False,
    }
)
print(output)

Parameters:

Parameter	Type	Required	Default	Description
`url`	string	Yes	-	Full URL to load
`timeout`	integer	No	`120`	Maximum seconds to wait for page load
`geo_code`	string	No	`"US"`	Two-letter proxy country code
`block_resources`	boolean	No	`False`	Block images, CSS, and fonts for faster loading

Create AI Scraper

Create and run an AI scraper using natural language instructions.

from langchain_mrscraper import load_mrscraper_tools

create, = load_mrscraper_tools(
    token="MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_create_scraper"],
)

output = create.invoke(
    {
        "url": "https://example.com/products",
        "message": "Extract product names, prices, and ratings.",
        "agent": "listing",  # "general" | "listing" | "map"
        "proxy_country": "US",
        "max_depth": 2,
        "max_pages": 50,
        "limit": 1000,
        "include_patterns": "",
        "exclude_patterns": "",
    }
)
print(output)

Parameters:

Parameter	Type	Required	Default	Description
`url`	string	Yes	-	Target URL to scrape
`message`	string	Yes	-	Natural language extraction instructions
`agent`	string	No	`"general"`	Agent type: `"general"`, `"listing"`, or `"map"`
`proxy_country`	string	No	-	Two-letter country code for proxy
`max_depth`	integer	No	`2`	For map agent: link depth to follow
`max_pages`	integer	No	`50`	For map/listing agents: maximum pages to scrape
`limit`	integer	No	`1000`	For map agent: maximum results to return
`include_patterns`	string	No	`""`	For map agent: regex patterns for URLs to include
`exclude_patterns`	string	No	`""`	For map agent: regex patterns for URLs to exclude

Agent Types:

Agent	Best For
`"general"`	Single pages, product details, articles
`"listing"`	Product listings, search results, directories
`"map"`	Site crawling, URL discovery, sitemaps

For detailed information on agents, see AI Scraper Agents.

Rerun AI Scraper

Rerun an existing AI scraper on a new URL without creating a new scraper configuration.

from langchain_mrscraper import load_mrscraper_tools

rerun_ai, = load_mrscraper_tools(
    token="MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_rerun_ai_scraper"],
)

output = rerun_ai.invoke(
    {
        "scraper_id": "YOUR_AI_SCRAPER_ID",
        "url": "https://example.com/another-page",
        "max_depth": 2,
        "max_pages": 50,
        "limit": 1000,
        "include_patterns": "",
        "exclude_patterns": "",
    }
)
print(output)

Parameters:

Parameter	Type	Required	Description
`scraper_id`	string	Yes	ID of the existing AI scraper
`url`	string	Yes	New URL to scrape
`max_depth`	integer	No	For map agents: link depth to follow
`max_pages`	integer	No	For map/listing agents: maximum pages to scrape
`limit`	integer	No	For map agents: maximum results to return
`include_patterns`	string	No	For map agents: URL patterns to include
`exclude_patterns`	string	No	For map agents: URL patterns to exclude

Bulk Rerun AI Scraper

Run an existing AI scraper on multiple URLs in a single request.

from langchain_mrscraper import load_mrscraper_tools

bulk_ai, = load_mrscraper_tools(
    token="MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_bulk_rerun_ai_scraper"],
)

output = bulk_ai.invoke(
    {
        "scraper_id": "YOUR_AI_SCRAPER_ID",
        "urls": [
            "https://example.com/a",
            "https://example.com/b",
        ],
    }
)
print(output)

Parameters:

Parameter	Type	Required	Description
`scraper_id`	string	Yes	ID of the existing AI scraper
`urls`	array	Yes	List of URLs to scrape

Performance Tip

This is more efficient than calling mrscraper_rerun_ai_scraper once per URL. Use Get Result by ID to retrieve individual results.

Rerun Manual Scraper

Rerun an existing manual scraper (created in the MrScraper dashboard) on a new URL.

from langchain_mrscraper import load_mrscraper_tools

rerun_manual, = load_mrscraper_tools(
    token="MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_rerun_manual_scraper"],
)

output = rerun_manual.invoke(
    {
        "scraper_id": "YOUR_MANUAL_SCRAPER_ID",
        "url": "https://example.com/target",
    }
)
print(output)

Parameters:

Parameter	Type	Required	Description
`scraper_id`	string	Yes	ID of the manual scraper from the dashboard
`url`	string	Yes	URL to scrape

Bulk Rerun Manual Scraper

Run an existing manual scraper on multiple URLs in a single request.

from langchain_mrscraper import load_mrscraper_tools

bulk_manual, = load_mrscraper_tools(
    token="MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_bulk_rerun_manual_scraper"],
)

output = bulk_manual.invoke(
    {
        "scraper_id": "YOUR_MANUAL_SCRAPER_ID",
        "urls": [
            "https://example.com/one",
            "https://example.com/two",
        ],
    }
)
print(output)

Parameters:

Parameter	Type	Required	Description
`scraper_id`	string	Yes	ID of the manual scraper from the dashboard
`urls`	array	Yes	List of URLs to scrape

Performance Tip

This is more efficient than calling mrscraper_rerun_manual_scraper once per URL. Use Get Result by ID to retrieve individual results.

Retrieving Results

Get All Results

Retrieve a paginated list of scraping results with filtering and sorting options.

from langchain_mrscraper import load_mrscraper_tools

list_results, = load_mrscraper_tools(
    token="MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_get_all_results"],
)

output = list_results.invoke(
    {
        "sort_field": "updatedAt",
        "sort_order": "DESC",
        "page_size": 10,
        "page": 1,
        "search": None,
        "date_range_column": None,
        "start_at": None,
        "end_at": None,
    }
)
print(output)

Parameters:

Parameter	Type	Required	Default	Description
`sort_field`	string	No	`"updatedAt"`	Field to sort by
`sort_order`	string	No	`"DESC"`	Sort direction: `"ASC"` or `"DESC"`
`page_size`	integer	No	`10`	Number of results per page
`page`	integer	No	`1`	Page number (starting at 1)
`search`	string	No	`None`	Optional free-text filter
`date_range_column`	string	No	`None`	Date field to filter by
`start_at`	string	No	`None`	Start date (ISO 8601 format)
`end_at`	string	No	`None`	End date (ISO 8601 format)

Supported sort_field values:

"createdAt", "updatedAt", "id", "type", "url", "status", "error", "tokenUsage", "runtime"

Get Result by ID

Fetch a single scraping result using its unique ID.

from langchain_mrscraper import load_mrscraper_tools

get_one, = load_mrscraper_tools(
    token="MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_get_result_by_id"],
)

output = get_one.invoke({"result_id": "YOUR_RESULT_ID"})
print(output)

Parameters:

Parameter	Type	Required	Description
`result_id`	string	Yes	Unique identifier of the result

Tool Loading Options

You can obtain tools in several ways:

1. Toolkit (Recommended)

from langchain_mrscraper import MrScraperToolkit

tools = MrScraperToolkit(token="MRSCRAPER_API_TOKEN").get_tools()

2. Convenience Loader

from langchain_mrscraper import load_mrscraper_tools

tools = load_mrscraper_tools(token="MRSCRAPER_API_TOKEN")

3. Selective Tool Loading

from langchain_mrscraper import load_mrscraper_tools

tools = load_mrscraper_tools(
    token="MRSCRAPER_API_TOKEN",
    tool_names=[
        "mrscraper_fetch_html",
        "mrscraper_get_result_by_id",
    ],
)

4. Per-Tool Constructors

from langchain_mrscraper.tools import MrScraperFetchHtmlTool

fetch_html = MrScraperFetchHtmlTool(token="MRSCRAPER_API_TOKEN")

Tool Outputs

All tools return JSON strings (pretty-printed) from the MrScraper API, making them easy to parse and process in your agent workflows.

Async Usage

All tools implement _arun for async callers. In async code, you can use the tool's async entry point when your LangChain version supports it:

import asyncio
from langchain_mrscraper import load_mrscraper_tools

async def async_scrape():
    fetch_html, = load_mrscraper_tools(
        token="MRSCRAPER_API_TOKEN",
        tool_names=["mrscraper_fetch_html"],
    )
    
    # Use ainvoke for async execution
    output = await fetch_html.ainvoke({
        "url": "https://example.com",
        "timeout": 120,
    })
    print(output)

asyncio.run(async_scrape())

Async Support

Check your LangChain version to confirm ainvoke is available on BaseTool. Most recent versions include async support.

Best Practices

Token Management

Store your API token securely:

import os
from pathlib import Path

# Load from .env file
from dotenv import load_dotenv
load_dotenv()

token = os.getenv("MRSCRAPER_API_KEY")

Error Handling

Implement error handling in your agent workflows:

from langchain_mrscraper import load_mrscraper_tools

try:
    fetch_html, = load_mrscraper_tools(
        token="MRSCRAPER_API_TOKEN",
        tool_names=["mrscraper_fetch_html"],
    )
    
    output = fetch_html.invoke({"url": "https://example.com"})
    
except Exception as e:
    print(f"Scraping error: {e}")

Tool Selection

Only load the tools your agent needs:

# Good: Only load needed tools
tools = load_mrscraper_tools(
    tool_names=["mrscraper_fetch_html", "mrscraper_create_scraper"]
)

# Avoid: Loading all tools when only using a few
tools = MrScraperToolkit().get_tools()  # All 8 tools

LangChain SDK

On this page