Bulk Scraping

Learn how to efficiently scrape multiple URLs using our bulk scraping features.

MrScraper's Bulk Scraping feature allows you to extract data from multiple URLs in a single operation. Instead of scraping URLs one at a time, you can upload a list and apply the same scraping configuration to all of them, making it easy to gather large datasets efficiently.

Best For

Scraping multiple pages with similar structures, such as product listings, article archives, user profiles, or directory entries.

How to Configure Bulk Scraping

Step 1: Prepare Your URLs

You have two options for providing URLs:

Option A: Excel File Upload

Create an Excel file (.xlsx or .xls) with a column header named url or URL. Each row should contain one URL.

Option B: Direct Input

Paste URLs directly into the text area, one URL per line:

Step 2: Upload Your URLs

  1. Open a new or existing scraper in your MrScraper dashboard
  2. Click the Multiple URLs button in the top section
  3. Choose your input method:
    • Click Upload File to select an Excel file from your computer
    • Or paste URLs directly into the text area
  4. Click Save to confirm your URL list

Step 3: Run the Bulk Scrape

  1. Click Run All to start the bulk scraping process
  2. Click Result to view the scraper’s progress and results in real time

Token Usage

Each URL in your bulk scrape consumes tokens. Ensure you have sufficient tokens in your account before starting a large bulk operation. Check your token balance in your account settings.

Step 4: Monitor Progress

The Result page displays real-time progress information:

While Scraping:

In Progress
{
  "mergedData": null,
  "urlDetails": [],
  "summary": {
    "totalUrls": 4,
    "successfulUrls": 0,
    "failedUrls": 0,
    "scrapedCount": 0,
    "totalTokenUsage": 0,
    "estimatedFinishAt": null
  }
}

Progress Indicators:

FieldDescription
totalUrlsTotal number of URLs being scraped
successfulUrlsNumber of URLs scraped successfully
failedUrlsNumber of URLs that encountered errors
scrapedCountCurrent number of completed scrapes
totalTokenUsageTotal tokens consumed so far
estimatedFinishAtEstimated completion time

Once scraping completes, the Result page displays all extracted data:

Completed Results
[
  {
    "1": {
      "id": "a897fe39b1053632",
      "name": "A Light in the Attic",
      "price": "£51.77",
      "rating": null,
      "source": "product",
      "features": [
        "Classic collection of poetry and drawings from Shel Silverstein",
        "20th anniversary special edition",
        "Humorous and creative verse"
      ],
      "return_policy": null,
      "shipping_info": {
        "availability": "In stock (22 available)",
        "shipping_arrival": null
      },
      "specifications": {
        "UPC": "a897fe39b1053632",
        "Product Type": "Books",
        "Price (excl. tax)": "£51.77",
        "Price (incl. tax)": "£51.77",
        "Tax": "£0.00",
        "Availability": "In stock (22 available)",
        "Number of reviews": "0"
      }
    }
  },
  {
    "1": {
      "id": "90fa61229261140a",
      "name": "Tipping the Velvet",
      "price": "£53.74",
      "rating": null,
      "source": "product",
      "features": [
        "Erotic and absorbing...Written with starling power.",
        "Nan King, an oyster girl, is captivated by the music hall phenomenon Kitty Butler"
      ],
      "shipping_info": {
        "availability": "In stock (20 available)",
        "shipping_arrival": null
      },
      "specifications": {
        "UPC": "90fa61229261140a",
        "Product Type": "Books",
        "Price (excl. tax)": "£53.74",
        "Price (incl. tax)": "£53.74",
        "Tax": "£0.00",
        "Availability": "In stock (20 available)",
        "Number of reviews": "0"
      }
    }
  }
]

For more details, refer to the Bulk Scraping API.

Cancel Bulk Scraping

You can cancel a bulk scraping job using either the Scraper ID or the Result ID.

Option 1: Cancel by Scraper ID

Cancel all pending URLs associated with a scraper. Use this when you want to stop the bulk operation immediately.

curl -X PATCH "https://api.app.mrscraper.com/api/v1/scrapers-bulks/{scraperId}/cancel" \
  -H "x-api-token: YOUR_API_TOKEN"

Parameters:

  • {scraperId} - The scraper ID used for the bulk operation

Option 2: Cancel by Result ID

Cancel a bulk scraping job using the bulk result ID returned from the bulk rerun response. This is useful when you have the result ID from the Result page and want to cancel that specific bulk operation.

curl -X PATCH "https://api.app.mrscraper.com/api/v1/scrapers-bulks/result/{resultId}/cancel" \
  -H "x-api-token: YOUR_API_TOKEN"

Parameters:

  • {resultId} - The bulk result ID from the bulk scraping response (e.g., ec5d81b9-55fa-4949-bba9-cceb448cb950)

Response:

Both endpoints return the bulk result ID, the original URL list, and which URLs were successfully canceled:

{
  "message": "Successful operation!",
  "data": {
    "id": "ec5d81b9-55fa-4949-bba9-cceb448cb950",
    "bulkUrls": [
      "https://books.toscrape.com/catalogue/page-1.html",
      "https://books.toscrape.com/catalogue/page-2.html",
      "https://books.toscrape.com/catalogue/page-3.html",
      "https://books.toscrape.com/catalogue/page-4.html",
      "https://books.toscrape.com/catalogue/page-5.html"
    ],
    "canceledUrls": [
      "https://books.toscrape.com/catalogue/page-3.html",
      "https://books.toscrape.com/catalogue/page-4.html",
      "https://books.toscrape.com/catalogue/page-5.html"
    ]
  }
}

For detailed API documentation, see:

On this page