LangChain

Use MrScraper as LangChain tools via the langchain-mrscraper package—fetch HTML, run AI and manual scrapers, and list results from agents.

The langchain-mrscraper package exposes the MrScraper API as LangChain BaseTool instances so agents can fetch rendered HTML, create and rerun scrapers, and get results. The SDK client (mrscraper-sdk) is installed as a dependency.

Tools, not a document loader

This integration is tools-first: each endpoint is an explicit tool an agent can call. For deterministic “URL → documents” ingestion into vector stores, a document loader is often a better fit; MrScraper may offer that in a separate package later.

Installation

pip install -U langchain-mrscraper

Requirements

Python 3.9+. See langchain-mrscraper on PyPI for the latest version.

Authentication

Set your API key from the MrScraper app. You can learn how to generate an API token in the Generate Token guide.

import os

os.environ["MRSCRAPER_API_KEY"] = "YOUR_MRSCRAPER_API_TOKEN"

You can also pass token="..." or mrscraper_api_key="..." when building the toolkit or calling load_mrscraper_tools. Environment variables MRSCRAPER_API_KEY (preferred) or MRSCRAPER_API_TOKEN are read automatically.

Quick start

Load all tools with MrScraperToolkit:

from langchain_mrscraper import MrScraperToolkit

tools = MrScraperToolkit().get_tools()
# or: MrScraperToolkit(token="YOUR_MRSCRAPER_API_TOKEN").get_tools()

Or use the convenience loader:

from langchain_mrscraper import load_mrscraper_tools

tools = load_mrscraper_tools()

Use with an agent

Example with LangGraph’s prebuilt ReAct agent and OpenAI chat (install langgraph and langchain-openai separately):

from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
from langchain_mrscraper import MrScraperToolkit

tools = MrScraperToolkit(token="YOUR_MRSCRAPER_API_TOKEN").get_tools()
agent = create_react_agent(ChatOpenAI(model="gpt-4o-mini"), tools)

API styles

You can obtain tools in several ways:

  • MrScraperToolkit(...).get_tools() — recommended factory.
  • load_mrscraper_tools(...) — same configuration via a function; supports tool_names to return only selected tools.
  • Per-tool constructors — each tool class accepts token or mrscraper_api_key (and optional shared client).
  • EnvironmentMRSCRAPER_API_KEY or MRSCRAPER_API_TOKEN when you do not pass a token explicitly.

Restrict tools:

from langchain_mrscraper import load_mrscraper_tools

tools = load_mrscraper_tools(
    token="YOUR_MRSCRAPER_API_TOKEN",
    tool_names=[
        "mrscraper_fetch_html",
        "mrscraper_get_result_by_id",
    ],
)

Available tools

ToolPurpose
mrscraper_fetch_htmlRendered HTML via the stealth browser
mrscraper_create_scraperCreate and run an AI scraper from natural language
mrscraper_rerun_ai_scraperRerun an existing AI scraper on a new URL
mrscraper_bulk_rerun_ai_scraperRerun an AI scraper for many URLs
mrscraper_rerun_manual_scraperRerun a dashboard manual scraper on one URL
mrscraper_bulk_rerun_manual_scraperBulk rerun a manual scraper
mrscraper_get_all_resultsPaginated list of results (sort, search, dates)
mrscraper_get_result_by_idFetch one result by ID

Tool outputs are JSON strings (pretty-printed) returned from the MrScraper API.

The examples below use load_mrscraper_tools with a single tool_name so each snippet is copy-pasteable. Replace placeholders like scraper and result IDs with values from your dashboard or prior API responses.

Fetch raw HTML (stealth browser)

Fetch fully rendered HTML after JavaScript execution.

from langchain_mrscraper import load_mrscraper_tools

fetch_html, = load_mrscraper_tools(
    token="YOUR_MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_fetch_html"],
)

output = fetch_html.invoke(
    {
        "url": "https://example.com/page",
        "timeout": 120,
        "geo_code": "US",
        "block_resources": False,
    }
)
print(output)
InputDescription
urlFull URL to load
timeoutMax seconds to wait (default 120)
geo_codeTwo-letter proxy country code (default "US")
block_resourcesBlock images/CSS/fonts for faster loads (default False)

Create AI scraper

Create and run an AI scraper from natural-language instructions. The response includes scraper metadata (including an ID for reruns).

from langchain_mrscraper import load_mrscraper_tools

create, = load_mrscraper_tools(
    token="YOUR_MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_create_scraper"],
)

output = create.invoke(
    {
        "url": "https://example.com/products",
        "message": "Extract product names, prices, and ratings.",
        "agent": "listing",  # "general" | "listing" | "map"
        "proxy_country": "US",
        "max_depth": 2,
        "max_pages": 50,
        "limit": 1000,
        "include_patterns": "",
        "exclude_patterns": "",
    }
)
print(output)
InputDescription
urlTarget URL
messageNatural-language extraction instructions
agentgeneral, listing, or map
proxy_countryOptional two-letter country code
max_depth, max_pages, limit, include_patterns, exclude_patternsMap mode and filtering (see AI scraper docs)

For how agents differ, see AI Scraper agents. The REST surface is aligned with AI scraper init in the API reference.

Rerun AI Scraper

Rerun AI Scraper is used to run an existing AI scraper that has already been created on the MrScraper platform, or you can create a scraper first and use its scraper_id. It lets you reuse the same scraper for new URLs without creating a new one.

from langchain_mrscraper import load_mrscraper_tools

rerun_ai, = load_mrscraper_tools(
    token="YOUR_MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_rerun_ai_scraper"],
)

output = rerun_ai.invoke(
    {
        "scraper_id": "YOUR_AI_SCRAPER_ID",
        "url": "https://example.com/another-page",
        "max_depth": 2,
        "max_pages": 50,
        "limit": 1000,
        "include_patterns": "",
        "exclude_patterns": "",
    }
)
print(output)

Bulk rerun (AI scraper)

Bulk Rerun AI lets you run an existing AI scraper on multiple URLs in one request. Instead of calling rerun_scraper for each URL, you can pass a list and process them all at once, making it faster and reducing API calls.

from langchain_mrscraper import load_mrscraper_tools

bulk_ai, = load_mrscraper_tools(
    token="YOUR_MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_bulk_rerun_ai_scraper"],
)

output = bulk_ai.invoke(
    {
        "scraper_id": "YOUR_AI_SCRAPER_ID",
        "urls": [
            "https://example.com/a",
            "https://example.com/b",
        ],
    }
)
print(output)

This is more efficient than calling rerun_scraper once per URL. Consolidated job metadata is in the response; use results to read per-URL outcomes.

Rerun manual scraper

Rerun manual Scraper is used to run an existing manual scraper that has already been created on the MrScraper platform, or you can create a scraper first and use its scraper_id. It lets you reuse the same scraper for new URLs without creating a new one.

from langchain_mrscraper import load_mrscraper_tools

rerun_manual, = load_mrscraper_tools(
    token="YOUR_MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_rerun_manual_scraper"],
)

output = rerun_manual.invoke(
    {
        "scraper_id": "YOUR_MANUAL_SCRAPER_ID",
        "url": "https://example.com/target",
    }
)
print(output)

Bulk rerun (manual scraper)

Bulk Rerun manual lets you run an existing manual scraper on multiple URLs in one request. Instead of calling rerun_scraper for each URL, you can pass a list and process them all at once, making it faster and reducing API calls.

from langchain_mrscraper import load_mrscraper_tools

bulk_manual, = load_mrscraper_tools(
    token="YOUR_MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_bulk_rerun_manual_scraper"],
)

output = bulk_manual.invoke(
    {
        "scraper_id": "YOUR_MANUAL_SCRAPER_ID",
        "urls": [
            "https://example.com/one",
            "https://example.com/two",
        ],
    }
)
print(output)

The response includes bulk job information (for example job id and status). Poll or list results with get_all_results as needed.

Get All Results in Range

List scraping results with pagination, sorting, optional search, and optional date range.

from langchain_mrscraper import load_mrscraper_tools

list_results, = load_mrscraper_tools(
    token="YOUR_MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_get_all_results"],
)

output = list_results.invoke(
    {
        "sort_field": "updatedAt",
        "sort_order": "DESC",
        "page_size": 10,
        "page": 1,
        "search": None,
        "date_range_column": None,
        "start_at": None,
        "end_at": None,
    }
)
print(output)
InputDescription
sort_fieldOne of: createdAt, updatedAt, id, type, url, status, error, tokenUsage, runtime
sort_orderASC or DESC
page_size, pagePage size and page number (starting at 1)
searchOptional free-text filter
date_range_column, start_at, end_atOptional ISO-8601 range filtering

Get Result by ID

Fetch a single result record by its MrScraper result ID.

from langchain_mrscraper import load_mrscraper_tools

get_one, = load_mrscraper_tools(
    token="YOUR_MRSCRAPER_API_TOKEN",
    tool_names=["mrscraper_get_result_by_id"],
)

output = get_one.invoke({"result_id": "YOUR_RESULT_ID"})
print(output)

Async usage

Tools implement _arun for async callers. In async code you can use the tool’s async entry point (e.g. await tool.ainvoke({...})) when your LangChain version exposes ainvoke on BaseTool.

See also

On this page