LangChain SDK

Use MrScraper as LangChain tools via the langchain-mrscraper package—fetch HTML, run AI and manual scrapers, and list results from agents.

The langchain-mrscraper package exposes the MrScraper API as LangChain BaseTool instances, enabling AI agents to fetch rendered HTML, create and rerun scrapers, and retrieve results. The SDK client (mrscraper-sdk) is installed as a dependency.

Tools-First Integration

This integration provides tools that agents can explicitly call—not a document loader. For deterministic "URL → documents" ingestion into vector stores, a document loader is typically a better fit. MrScraper may offer that in a separate package later.

Requirements

  • Python 3.9 or higher
  • LangChain ecosystem packages (installed as dependencies)

Package Information

See langchain-mrscraper on PyPI for the latest release, version history, and installation details.

Installation

Install the package from PyPI using pip:

pip install -U langchain-mrscraper

This installs both the LangChain integration and the underlying MrScraper SDK client.

Authentication

Set your API key from the MrScraper dashboard as an environment variable:

import os

os.environ["MRSCRAPER_API_KEY"] = "YOUR_MRSCRAPER_API_TOKEN"

Authentication Options

You can authenticate in three ways:

  1. Environment variable (recommended): MRSCRAPER_API_KEY or MRSCRAPER_API_TOKEN
  2. Toolkit parameter: MrScraperToolkit(token="...")
  3. Function parameter: load_mrscraper_tools(mrscraper_api_key="...")

For more information on generating API tokens, see the Generate Token guide.

Quick Start

Using MrScraperToolkit

Load all available tools with the toolkit:

from langchain_mrscraper import MrScraperToolkit

# Using environment variable
tools = MrScraperToolkit().get_tools()

# Or with explicit token
tools = MrScraperToolkit(token="YOUR_MRSCRAPER_API_TOKEN").get_tools()

Using Convenience Loader

Alternatively, use the convenience function:

from langchain_mrscraper import load_mrscraper_tools

tools = load_mrscraper_tools()

Using with AI Agents

from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
from langchain_mrscraper import MrScraperToolkit

# Load MrScraper tools
tools = MrScraperToolkit(token="YOUR_MRSCRAPER_API_TOKEN").get_tools()

# Create agent with OpenAI
agent = create_react_agent(ChatOpenAI(model="gpt-4o-mini"), tools)

# Use the agent
response = agent.invoke({
    "messages": [{"role": "user", "content": "Fetch the HTML from https://example.com"}]
})

Prerequisites

This example requires langgraph and langchain-openai packages:

pip install langgraph langchain-openai

Core Methods

The examples below demonstrate each tool individually. Replace placeholders like scraper and result IDs with values from your dashboard or prior API responses.

Fetch Raw HTML

Fetch fully rendered HTML after JavaScript execution using the stealth browser.

from langchain_mrscraper import load_mrscraper_tools

fetch_html, = load_mrscraper_tools(
    token="YOUR_MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_fetch_html"],
)

output = fetch_html.invoke(
    {
        "url": "https://example.com/page",
        "timeout": 120,
        "geo_code": "US",
        "block_resources": False,
    }
)
print(output)

Parameters:

ParameterTypeRequiredDefaultDescription
urlstringYes-Full URL to load
timeoutintegerNo120Maximum seconds to wait for page load
geo_codestringNo"US"Two-letter proxy country code
block_resourcesbooleanNoFalseBlock images, CSS, and fonts for faster loading

Create AI Scraper

Create and run an AI scraper using natural language instructions.

from langchain_mrscraper import load_mrscraper_tools

create, = load_mrscraper_tools(
    token="YOUR_MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_create_scraper"],
)

output = create.invoke(
    {
        "url": "https://example.com/products",
        "message": "Extract product names, prices, and ratings.",
        "agent": "listing",  # "general" | "listing" | "map"
        "proxy_country": "US",
        "max_depth": 2,
        "max_pages": 50,
        "limit": 1000,
        "include_patterns": "",
        "exclude_patterns": "",
    }
)
print(output)

Parameters:

ParameterTypeRequiredDefaultDescription
urlstringYes-Target URL to scrape
messagestringYes-Natural language extraction instructions
agentstringNo"general"Agent type: "general", "listing", or "map"
proxy_countrystringNo-Two-letter country code for proxy
max_depthintegerNo2For map agent: link depth to follow
max_pagesintegerNo50For map/listing agents: maximum pages to scrape
limitintegerNo1000For map agent: maximum results to return
include_patternsstringNo""For map agent: regex patterns for URLs to include
exclude_patternsstringNo""For map agent: regex patterns for URLs to exclude

Agent Types:

AgentBest For
"general"Single pages, product details, articles
"listing"Product listings, search results, directories
"map"Site crawling, URL discovery, sitemaps

For detailed information on agents, see AI Scraper Agents.

Rerun AI Scraper

Rerun an existing AI scraper on a new URL without creating a new scraper configuration.

from langchain_mrscraper import load_mrscraper_tools

rerun_ai, = load_mrscraper_tools(
    token="YOUR_MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_rerun_ai_scraper"],
)

output = rerun_ai.invoke(
    {
        "scraper_id": "YOUR_AI_SCRAPER_ID",
        "url": "https://example.com/another-page",
        "max_depth": 2,
        "max_pages": 50,
        "limit": 1000,
        "include_patterns": "",
        "exclude_patterns": "",
    }
)
print(output)

Parameters:

ParameterTypeRequiredDescription
scraper_idstringYesID of the existing AI scraper
urlstringYesNew URL to scrape
max_depthintegerNoFor map agents: link depth to follow
max_pagesintegerNoFor map/listing agents: maximum pages to scrape
limitintegerNoFor map agents: maximum results to return
include_patternsstringNoFor map agents: URL patterns to include
exclude_patternsstringNoFor map agents: URL patterns to exclude

Bulk Rerun AI Scraper

Run an existing AI scraper on multiple URLs in a single request.

from langchain_mrscraper import load_mrscraper_tools

bulk_ai, = load_mrscraper_tools(
    token="YOUR_MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_bulk_rerun_ai_scraper"],
)

output = bulk_ai.invoke(
    {
        "scraper_id": "YOUR_AI_SCRAPER_ID",
        "urls": [
            "https://example.com/a",
            "https://example.com/b",
        ],
    }
)
print(output)

Parameters:

ParameterTypeRequiredDescription
scraper_idstringYesID of the existing AI scraper
urlsarrayYesList of URLs to scrape

Performance Tip

This is more efficient than calling mrscraper_rerun_ai_scraper once per URL. Use Get Result by ID to retrieve individual results.

Rerun Manual Scraper

Rerun an existing manual scraper (created in the MrScraper dashboard) on a new URL.

from langchain_mrscraper import load_mrscraper_tools

rerun_manual, = load_mrscraper_tools(
    token="YOUR_MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_rerun_manual_scraper"],
)

output = rerun_manual.invoke(
    {
        "scraper_id": "YOUR_MANUAL_SCRAPER_ID",
        "url": "https://example.com/target",
    }
)
print(output)

Parameters:

ParameterTypeRequiredDescription
scraper_idstringYesID of the manual scraper from the dashboard
urlstringYesURL to scrape

Bulk Rerun Manual Scraper

Run an existing manual scraper on multiple URLs in a single request.

from langchain_mrscraper import load_mrscraper_tools

bulk_manual, = load_mrscraper_tools(
    token="YOUR_MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_bulk_rerun_manual_scraper"],
)

output = bulk_manual.invoke(
    {
        "scraper_id": "YOUR_MANUAL_SCRAPER_ID",
        "urls": [
            "https://example.com/one",
            "https://example.com/two",
        ],
    }
)
print(output)

Parameters:

ParameterTypeRequiredDescription
scraper_idstringYesID of the manual scraper from the dashboard
urlsarrayYesList of URLs to scrape

Performance Tip

This is more efficient than calling mrscraper_rerun_manual_scraper once per URL. Use Get Result by ID to retrieve individual results.

Retrieving Results

Get All Results

Retrieve a paginated list of scraping results with filtering and sorting options.

from langchain_mrscraper import load_mrscraper_tools

list_results, = load_mrscraper_tools(
    token="YOUR_MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_get_all_results"],
)

output = list_results.invoke(
    {
        "sort_field": "updatedAt",
        "sort_order": "DESC",
        "page_size": 10,
        "page": 1,
        "search": None,
        "date_range_column": None,
        "start_at": None,
        "end_at": None,
    }
)
print(output)

Parameters:

ParameterTypeRequiredDefaultDescription
sort_fieldstringNo"updatedAt"Field to sort by
sort_orderstringNo"DESC"Sort direction: "ASC" or "DESC"
page_sizeintegerNo10Number of results per page
pageintegerNo1Page number (starting at 1)
searchstringNoNoneOptional free-text filter
date_range_columnstringNoNoneDate field to filter by
start_atstringNoNoneStart date (ISO 8601 format)
end_atstringNoNoneEnd date (ISO 8601 format)

Supported sort_field values:

"createdAt", "updatedAt", "id", "type", "url", "status", "error", "tokenUsage", "runtime"

Get Result by ID

Fetch a single scraping result using its unique ID.

from langchain_mrscraper import load_mrscraper_tools

get_one, = load_mrscraper_tools(
    token="YOUR_MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_get_result_by_id"],
)

output = get_one.invoke({"result_id": "YOUR_RESULT_ID"})
print(output)

Parameters:

ParameterTypeRequiredDescription
result_idstringYesUnique identifier of the result

Tool Loading Options

You can obtain tools in several ways:

1. Toolkit (Recommended)

from langchain_mrscraper import MrScraperToolkit

tools = MrScraperToolkit(token="YOUR_MRSCRAPER_API_TOKEN").get_tools()

2. Convenience Loader

from langchain_mrscraper import load_mrscraper_tools

tools = load_mrscraper_tools(token="YOUR_MRSCRAPER_API_TOKEN")

3. Selective Tool Loading

from langchain_mrscraper import load_mrscraper_tools

tools = load_mrscraper_tools(
    token="YOUR_MRSCRAPER_API_TOKEN",
    tool_names=[
        "mrscraper_fetch_html",
        "mrscraper_get_result_by_id",
    ],
)

4. Per-Tool Constructors

from langchain_mrscraper.tools import MrScraperFetchHtmlTool

fetch_html = MrScraperFetchHtmlTool(token="YOUR_MRSCRAPER_API_TOKEN")

Tool Outputs

All tools return JSON strings (pretty-printed) from the MrScraper API, making them easy to parse and process in your agent workflows.

Async Usage

All tools implement _arun for async callers. In async code, you can use the tool's async entry point when your LangChain version supports it:

import asyncio
from langchain_mrscraper import load_mrscraper_tools

async def async_scrape():
    fetch_html, = load_mrscraper_tools(
        token="YOUR_MRSCRAPER_API_TOKEN",
        tool_names=["mrscraper_fetch_html"],
    )
    
    # Use ainvoke for async execution
    output = await fetch_html.ainvoke({
        "url": "https://example.com",
        "timeout": 120,
    })
    print(output)

asyncio.run(async_scrape())

Async Support

Check your LangChain version to confirm ainvoke is available on BaseTool. Most recent versions include async support.

Best Practices

Token Management

Store your API token securely:

import os
from pathlib import Path

# Load from .env file
from dotenv import load_dotenv
load_dotenv()

token = os.getenv("MRSCRAPER_API_KEY")

Error Handling

Implement error handling in your agent workflows:

from langchain_mrscraper import load_mrscraper_tools

try:
    fetch_html, = load_mrscraper_tools(
        token="YOUR_MRSCRAPER_API_TOKEN",
        tool_names=["mrscraper_fetch_html"],
    )
    
    output = fetch_html.invoke({"url": "https://example.com"})
    
except Exception as e:
    print(f"Scraping error: {e}")

Tool Selection

Only load the tools your agent needs:

# Good: Only load needed tools
tools = load_mrscraper_tools(
    tool_names=["mrscraper_fetch_html", "mrscraper_create_scraper"]
)

# Avoid: Loading all tools when only using a few
tools = MrScraperToolkit().get_tools()  # All 8 tools

On this page