LangChain
Use MrScraper as LangChain tools via the langchain-mrscraper package—fetch HTML, run AI and manual scrapers, and list results from agents.
The langchain-mrscraper package exposes the MrScraper API as LangChain BaseTool instances so agents can fetch rendered HTML, create and rerun scrapers, and get results. The SDK client (mrscraper-sdk) is installed as a dependency.
Tools, not a document loader
This integration is tools-first: each endpoint is an explicit tool an agent can call. For deterministic “URL → documents” ingestion into vector stores, a document loader is often a better fit; MrScraper may offer that in a separate package later.
Installation
pip install -U langchain-mrscraperRequirements
Python 3.9+. See langchain-mrscraper on PyPI for the latest version.
Authentication
Set your API key from the MrScraper app. You can learn how to generate an API token in the Generate Token guide.
import os
os.environ["MRSCRAPER_API_KEY"] = "YOUR_MRSCRAPER_API_TOKEN"You can also pass token="..." or mrscraper_api_key="..." when building the toolkit or calling load_mrscraper_tools. Environment variables MRSCRAPER_API_KEY (preferred) or MRSCRAPER_API_TOKEN are read automatically.
Quick start
Load all tools with MrScraperToolkit:
from langchain_mrscraper import MrScraperToolkit
tools = MrScraperToolkit().get_tools()
# or: MrScraperToolkit(token="YOUR_MRSCRAPER_API_TOKEN").get_tools()Or use the convenience loader:
from langchain_mrscraper import load_mrscraper_tools
tools = load_mrscraper_tools()Use with an agent
Example with LangGraph’s prebuilt ReAct agent and OpenAI chat (install langgraph and langchain-openai separately):
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
from langchain_mrscraper import MrScraperToolkit
tools = MrScraperToolkit(token="YOUR_MRSCRAPER_API_TOKEN").get_tools()
agent = create_react_agent(ChatOpenAI(model="gpt-4o-mini"), tools)API styles
You can obtain tools in several ways:
MrScraperToolkit(...).get_tools()— recommended factory.load_mrscraper_tools(...)— same configuration via a function; supportstool_namesto return only selected tools.- Per-tool constructors — each tool class accepts
tokenormrscraper_api_key(and optional sharedclient). - Environment —
MRSCRAPER_API_KEYorMRSCRAPER_API_TOKENwhen you do not pass a token explicitly.
Restrict tools:
from langchain_mrscraper import load_mrscraper_tools
tools = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=[
"mrscraper_fetch_html",
"mrscraper_get_result_by_id",
],
)Available tools
| Tool | Purpose |
|---|---|
mrscraper_fetch_html | Rendered HTML via the stealth browser |
mrscraper_create_scraper | Create and run an AI scraper from natural language |
mrscraper_rerun_ai_scraper | Rerun an existing AI scraper on a new URL |
mrscraper_bulk_rerun_ai_scraper | Rerun an AI scraper for many URLs |
mrscraper_rerun_manual_scraper | Rerun a dashboard manual scraper on one URL |
mrscraper_bulk_rerun_manual_scraper | Bulk rerun a manual scraper |
mrscraper_get_all_results | Paginated list of results (sort, search, dates) |
mrscraper_get_result_by_id | Fetch one result by ID |
Tool outputs are JSON strings (pretty-printed) returned from the MrScraper API.
The examples below use load_mrscraper_tools with a single tool_name so each snippet is copy-pasteable. Replace placeholders like scraper and result IDs with values from your dashboard or prior API responses.
Fetch raw HTML (stealth browser)
Fetch fully rendered HTML after JavaScript execution.
from langchain_mrscraper import load_mrscraper_tools
fetch_html, = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=["mrscraper_fetch_html"],
)
output = fetch_html.invoke(
{
"url": "https://example.com/page",
"timeout": 120,
"geo_code": "US",
"block_resources": False,
}
)
print(output)| Input | Description |
|---|---|
url | Full URL to load |
timeout | Max seconds to wait (default 120) |
geo_code | Two-letter proxy country code (default "US") |
block_resources | Block images/CSS/fonts for faster loads (default False) |
Create AI scraper
Create and run an AI scraper from natural-language instructions. The response includes scraper metadata (including an ID for reruns).
from langchain_mrscraper import load_mrscraper_tools
create, = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=["mrscraper_create_scraper"],
)
output = create.invoke(
{
"url": "https://example.com/products",
"message": "Extract product names, prices, and ratings.",
"agent": "listing", # "general" | "listing" | "map"
"proxy_country": "US",
"max_depth": 2,
"max_pages": 50,
"limit": 1000,
"include_patterns": "",
"exclude_patterns": "",
}
)
print(output)| Input | Description |
|---|---|
url | Target URL |
message | Natural-language extraction instructions |
agent | general, listing, or map |
proxy_country | Optional two-letter country code |
max_depth, max_pages, limit, include_patterns, exclude_patterns | Map mode and filtering (see AI scraper docs) |
For how agents differ, see AI Scraper agents. The REST surface is aligned with AI scraper init in the API reference.
Rerun AI Scraper
Rerun AI Scraper is used to run an existing AI scraper that has already been created on the MrScraper platform, or you can create a scraper first and use its scraper_id. It lets you reuse the same scraper for new URLs without creating a new one.
from langchain_mrscraper import load_mrscraper_tools
rerun_ai, = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=["mrscraper_rerun_ai_scraper"],
)
output = rerun_ai.invoke(
{
"scraper_id": "YOUR_AI_SCRAPER_ID",
"url": "https://example.com/another-page",
"max_depth": 2,
"max_pages": 50,
"limit": 1000,
"include_patterns": "",
"exclude_patterns": "",
}
)
print(output)Bulk rerun (AI scraper)
Bulk Rerun AI lets you run an existing AI scraper on multiple URLs in one request. Instead of calling rerun_scraper for each URL, you can pass a list and process them all at once, making it faster and reducing API calls.
from langchain_mrscraper import load_mrscraper_tools
bulk_ai, = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=["mrscraper_bulk_rerun_ai_scraper"],
)
output = bulk_ai.invoke(
{
"scraper_id": "YOUR_AI_SCRAPER_ID",
"urls": [
"https://example.com/a",
"https://example.com/b",
],
}
)
print(output)This is more efficient than calling rerun_scraper once per URL. Consolidated job metadata is in the response; use results to read per-URL outcomes.
Rerun manual scraper
Rerun manual Scraper is used to run an existing manual scraper that has already been created on the MrScraper platform, or you can create a scraper first and use its scraper_id. It lets you reuse the same scraper for new URLs without creating a new one.
from langchain_mrscraper import load_mrscraper_tools
rerun_manual, = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=["mrscraper_rerun_manual_scraper"],
)
output = rerun_manual.invoke(
{
"scraper_id": "YOUR_MANUAL_SCRAPER_ID",
"url": "https://example.com/target",
}
)
print(output)Bulk rerun (manual scraper)
Bulk Rerun manual lets you run an existing manual scraper on multiple URLs in one request. Instead of calling rerun_scraper for each URL, you can pass a list and process them all at once, making it faster and reducing API calls.
from langchain_mrscraper import load_mrscraper_tools
bulk_manual, = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=["mrscraper_bulk_rerun_manual_scraper"],
)
output = bulk_manual.invoke(
{
"scraper_id": "YOUR_MANUAL_SCRAPER_ID",
"urls": [
"https://example.com/one",
"https://example.com/two",
],
}
)
print(output)The response includes bulk job information (for example job id and status). Poll or list results with get_all_results as needed.
Get All Results in Range
List scraping results with pagination, sorting, optional search, and optional date range.
from langchain_mrscraper import load_mrscraper_tools
list_results, = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=["mrscraper_get_all_results"],
)
output = list_results.invoke(
{
"sort_field": "updatedAt",
"sort_order": "DESC",
"page_size": 10,
"page": 1,
"search": None,
"date_range_column": None,
"start_at": None,
"end_at": None,
}
)
print(output)| Input | Description |
|---|---|
sort_field | One of: createdAt, updatedAt, id, type, url, status, error, tokenUsage, runtime |
sort_order | ASC or DESC |
page_size, page | Page size and page number (starting at 1) |
search | Optional free-text filter |
date_range_column, start_at, end_at | Optional ISO-8601 range filtering |
Get Result by ID
Fetch a single result record by its MrScraper result ID.
from langchain_mrscraper import load_mrscraper_tools
get_one, = load_mrscraper_tools(
token="YOUR_MRSCRAPER_API_TOKEN",
tool_names=["mrscraper_get_result_by_id"],
)
output = get_one.invoke({"result_id": "YOUR_RESULT_ID"})
print(output)Async usage
Tools implement _arun for async callers. In async code you can use the tool’s async entry point (e.g. await tool.ainvoke({...})) when your LangChain version exposes ainvoke on BaseTool.
See also
- Python SDK — direct
MrScraperclient usage - PyPI: langchain-mrscraper