LangChain SDK
Use MrScraper as LangChain tools via the langchain-mrscraper package—fetch HTML, run AI and manual scrapers, and list results from agents.
The langchain-mrscraper package exposes the MrScraper API as LangChain BaseTool instances, enabling AI agents to fetch rendered HTML, create and rerun scrapers, and retrieve results. The SDK client (mrscraper-sdk) is installed as a dependency.
Tools-First Integration
This integration provides tools that agents can explicitly call—not a document loader. For deterministic "URL → documents" ingestion into vector stores, a document loader is typically a better fit. MrScraper may offer that in a separate package later.
Requirements
- Python 3.9 or higher
- LangChain ecosystem packages (installed as dependencies)
Package Information
See langchain-mrscraper on PyPI for the latest release, version history, and installation details.
Installation
Install the package from PyPI using pip:
pip install -U langchain-mrscraperThis installs both the LangChain integration and the underlying MrScraper SDK client.
Authentication
Set your API key from the MrScraper dashboard as an environment variable:
import os
os.environ["MRSCRAPER_API_KEY"] = "YOUR_MRSCRAPER_API_TOKEN"Authentication Options
You can authenticate in three ways:
- Environment variable (recommended):
MRSCRAPER_API_KEYorMRSCRAPER_API_TOKEN - Toolkit parameter:
MrScraperToolkit(token="...") - Function parameter:
load_mrscraper_tools(mrscraper_api_key="...")
For more information on generating API tokens, see the Generate Token guide.
Quick Start
Using MrScraperToolkit
Load all available tools with the toolkit:
from langchain_mrscraper import MrScraperToolkit
# Using environment variable
tools = MrScraperToolkit().get_tools()
# Or with explicit token
tools = MrScraperToolkit(token="YOUR_MRSCRAPER_API_TOKEN").get_tools()Using Convenience Loader
Alternatively, use the convenience function:
from langchain_mrscraper import load_mrscraper_tools
tools = load_mrscraper_tools()Using with AI Agents
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
from langchain_mrscraper import MrScraperToolkit
# Load MrScraper tools
tools = MrScraperToolkit(token="YOUR_MRSCRAPER_API_TOKEN").get_tools()
# Create agent with OpenAI
agent = create_react_agent(ChatOpenAI(model="gpt-4o-mini"), tools)
# Use the agent
response = agent.invoke({
"messages": [{"role": "user", "content": "Fetch the HTML from https://example.com"}]
})Prerequisites
This example requires langgraph and langchain-openai packages:
pip install langgraph langchain-openaiCore Methods
The examples below demonstrate each tool individually. Replace placeholders like scraper and result IDs with values from your dashboard or prior API responses.
Fetch Raw HTML
Fetch fully rendered HTML after JavaScript execution using the stealth browser.
from langchain_mrscraper import load_mrscraper_tools
fetch_html, = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=["mrscraper_fetch_html"],
)
output = fetch_html.invoke(
{
"url": "https://example.com/page",
"timeout": 120,
"geo_code": "US",
"block_resources": False,
}
)
print(output)Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | - | Full URL to load |
timeout | integer | No | 120 | Maximum seconds to wait for page load |
geo_code | string | No | "US" | Two-letter proxy country code |
block_resources | boolean | No | False | Block images, CSS, and fonts for faster loading |
Create AI Scraper
Create and run an AI scraper using natural language instructions.
from langchain_mrscraper import load_mrscraper_tools
create, = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=["mrscraper_create_scraper"],
)
output = create.invoke(
{
"url": "https://example.com/products",
"message": "Extract product names, prices, and ratings.",
"agent": "listing", # "general" | "listing" | "map"
"proxy_country": "US",
"max_depth": 2,
"max_pages": 50,
"limit": 1000,
"include_patterns": "",
"exclude_patterns": "",
}
)
print(output)Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | - | Target URL to scrape |
message | string | Yes | - | Natural language extraction instructions |
agent | string | No | "general" | Agent type: "general", "listing", or "map" |
proxy_country | string | No | - | Two-letter country code for proxy |
max_depth | integer | No | 2 | For map agent: link depth to follow |
max_pages | integer | No | 50 | For map/listing agents: maximum pages to scrape |
limit | integer | No | 1000 | For map agent: maximum results to return |
include_patterns | string | No | "" | For map agent: regex patterns for URLs to include |
exclude_patterns | string | No | "" | For map agent: regex patterns for URLs to exclude |
Agent Types:
| Agent | Best For |
|---|---|
"general" | Single pages, product details, articles |
"listing" | Product listings, search results, directories |
"map" | Site crawling, URL discovery, sitemaps |
For detailed information on agents, see AI Scraper Agents.
Rerun AI Scraper
Rerun an existing AI scraper on a new URL without creating a new scraper configuration.
from langchain_mrscraper import load_mrscraper_tools
rerun_ai, = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=["mrscraper_rerun_ai_scraper"],
)
output = rerun_ai.invoke(
{
"scraper_id": "YOUR_AI_SCRAPER_ID",
"url": "https://example.com/another-page",
"max_depth": 2,
"max_pages": 50,
"limit": 1000,
"include_patterns": "",
"exclude_patterns": "",
}
)
print(output)Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
scraper_id | string | Yes | ID of the existing AI scraper |
url | string | Yes | New URL to scrape |
max_depth | integer | No | For map agents: link depth to follow |
max_pages | integer | No | For map/listing agents: maximum pages to scrape |
limit | integer | No | For map agents: maximum results to return |
include_patterns | string | No | For map agents: URL patterns to include |
exclude_patterns | string | No | For map agents: URL patterns to exclude |
Bulk Rerun AI Scraper
Run an existing AI scraper on multiple URLs in a single request.
from langchain_mrscraper import load_mrscraper_tools
bulk_ai, = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=["mrscraper_bulk_rerun_ai_scraper"],
)
output = bulk_ai.invoke(
{
"scraper_id": "YOUR_AI_SCRAPER_ID",
"urls": [
"https://example.com/a",
"https://example.com/b",
],
}
)
print(output)Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
scraper_id | string | Yes | ID of the existing AI scraper |
urls | array | Yes | List of URLs to scrape |
Performance Tip
This is more efficient than calling mrscraper_rerun_ai_scraper once per URL. Use Get Result by ID to retrieve individual results.
Rerun Manual Scraper
Rerun an existing manual scraper (created in the MrScraper dashboard) on a new URL.
from langchain_mrscraper import load_mrscraper_tools
rerun_manual, = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=["mrscraper_rerun_manual_scraper"],
)
output = rerun_manual.invoke(
{
"scraper_id": "YOUR_MANUAL_SCRAPER_ID",
"url": "https://example.com/target",
}
)
print(output)Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
scraper_id | string | Yes | ID of the manual scraper from the dashboard |
url | string | Yes | URL to scrape |
Bulk Rerun Manual Scraper
Run an existing manual scraper on multiple URLs in a single request.
from langchain_mrscraper import load_mrscraper_tools
bulk_manual, = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=["mrscraper_bulk_rerun_manual_scraper"],
)
output = bulk_manual.invoke(
{
"scraper_id": "YOUR_MANUAL_SCRAPER_ID",
"urls": [
"https://example.com/one",
"https://example.com/two",
],
}
)
print(output)Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
scraper_id | string | Yes | ID of the manual scraper from the dashboard |
urls | array | Yes | List of URLs to scrape |
Performance Tip
This is more efficient than calling mrscraper_rerun_manual_scraper once per URL. Use Get Result by ID to retrieve individual results.
Retrieving Results
Get All Results
Retrieve a paginated list of scraping results with filtering and sorting options.
from langchain_mrscraper import load_mrscraper_tools
list_results, = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=["mrscraper_get_all_results"],
)
output = list_results.invoke(
{
"sort_field": "updatedAt",
"sort_order": "DESC",
"page_size": 10,
"page": 1,
"search": None,
"date_range_column": None,
"start_at": None,
"end_at": None,
}
)
print(output)Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
sort_field | string | No | "updatedAt" | Field to sort by |
sort_order | string | No | "DESC" | Sort direction: "ASC" or "DESC" |
page_size | integer | No | 10 | Number of results per page |
page | integer | No | 1 | Page number (starting at 1) |
search | string | No | None | Optional free-text filter |
date_range_column | string | No | None | Date field to filter by |
start_at | string | No | None | Start date (ISO 8601 format) |
end_at | string | No | None | End date (ISO 8601 format) |
Supported sort_field values:
"createdAt", "updatedAt", "id", "type", "url", "status", "error", "tokenUsage", "runtime"
Get Result by ID
Fetch a single scraping result using its unique ID.
from langchain_mrscraper import load_mrscraper_tools
get_one, = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=["mrscraper_get_result_by_id"],
)
output = get_one.invoke({"result_id": "YOUR_RESULT_ID"})
print(output)Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
result_id | string | Yes | Unique identifier of the result |
Tool Loading Options
You can obtain tools in several ways:
1. Toolkit (Recommended)
from langchain_mrscraper import MrScraperToolkit
tools = MrScraperToolkit(token="YOUR_MRSCRAPER_API_TOKEN").get_tools()2. Convenience Loader
from langchain_mrscraper import load_mrscraper_tools
tools = load_mrscraper_tools(token="YOUR_MRSCRAPER_API_TOKEN")3. Selective Tool Loading
from langchain_mrscraper import load_mrscraper_tools
tools = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=[
"mrscraper_fetch_html",
"mrscraper_get_result_by_id",
],
)4. Per-Tool Constructors
from langchain_mrscraper.tools import MrScraperFetchHtmlTool
fetch_html = MrScraperFetchHtmlTool(token="YOUR_MRSCRAPER_API_TOKEN")Tool Outputs
All tools return JSON strings (pretty-printed) from the MrScraper API, making them easy to parse and process in your agent workflows.
Async Usage
All tools implement _arun for async callers. In async code, you can use the tool's async entry point when your LangChain version supports it:
import asyncio
from langchain_mrscraper import load_mrscraper_tools
async def async_scrape():
fetch_html, = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=["mrscraper_fetch_html"],
)
# Use ainvoke for async execution
output = await fetch_html.ainvoke({
"url": "https://example.com",
"timeout": 120,
})
print(output)
asyncio.run(async_scrape())Async Support
Check your LangChain version to confirm ainvoke is available on BaseTool. Most recent versions include async support.
Best Practices
Token Management
Store your API token securely:
import os
from pathlib import Path
# Load from .env file
from dotenv import load_dotenv
load_dotenv()
token = os.getenv("MRSCRAPER_API_KEY")Error Handling
Implement error handling in your agent workflows:
from langchain_mrscraper import load_mrscraper_tools
try:
fetch_html, = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=["mrscraper_fetch_html"],
)
output = fetch_html.invoke({"url": "https://example.com"})
except Exception as e:
print(f"Scraping error: {e}")Tool Selection
Only load the tools your agent needs:
# Good: Only load needed tools
tools = load_mrscraper_tools(
tool_names=["mrscraper_fetch_html", "mrscraper_create_scraper"]
)
# Avoid: Loading all tools when only using a few
tools = MrScraperToolkit().get_tools() # All 8 tools