This page explains how to set up a Manual Scraper in MrScraper. It’s a more customizable option for integrating a scraper into your apps and workflows. Unlike ScrapeGPT, you’ll need to provide the CSS selectors for the data you want.

For more on CSS selectors, check out this link.

Requirements

  • MrScraper console account.
  • CSS selectors for the data that you want to retrieve.

Manual Scraping Example

In this example, we’ll retrieve events data from Luma, returning results based on the defined workflow.

Follow the steps below to use our AI Scraper API to extract the data:

  1. Login to your MrScraper App Dashboard
  2. Navigate to the Scraper menu in the Sidebar.
  3. Click the Create Scraper button on the top right of the page.
  4. Select Manual as the scraper type, then fill in the Scraper Name and the default URLs. Then click the Create button.
  5. Navigate to the Workflow tab to define the workflow for the scraper by adding steps. There are some types of workflow steps available as demonstated on the screenshot below:
    • Scrape data: Extracts the specified data from the target website.
    • Infinite scroll: Automates scrolling on pages with endless content to load more data.
    • Wait time: Pauses the scraping process for a set duration to allow dynamic content to load.
    • Click element: Simulates clicking an element, such as a button or link, to trigger actions or load more data.
    • Take screenshot: Captures a screenshot of the current webpage or a specific area for reference.
    • Input text: Automatically types text into fields, such as search bars or forms, to interact with the website.
    • Algolia Crawler: Scrapes data specifically from websites that use the Algolia search engine for content retrieval.
You can add up to 10 steps.
  1. If you choose Scrape Data, a new step form as seen in the screenshot below will be created.

    You need to enter the name, CSS selector, data type, and quantity yourself.

  2. Recheck the result format by clicking on Preview Data button in the top right corner of the screen.

  3. If you want to get a list of object data complete with its’ properties, the first Scrape Data type needs to be a Collection (List of sub-items) and the quantity to be All Matches.

  4. You can find the Parent selector by right-clicking on the site you want to scrape then click Inspect. Then click on the Add item to collection button.

  5. After you click the button, you need to fill in the same form but for the collection items (or you can say for the children of the parent you just created on the previous step)

  6. Click Save changes when you’re done.

  7. Click Run scraper button on the top right of the page to run the scraper.

Tutorial Video

Here is the tutorial demo video for creating Manual Scraper in MrScraper.

And here is the result snippet of the scraping result presented in the video:

{
    "events": [
        {
            "date": "Thursday, October 17",
            "time": "5:00 PM - 7:30 PM EDT",
            "venue": "Register to See Address",
            "event_name": "Health Tech PHL",
            "organizers": [
                "Ed Melendez",
                "David M. Nichols, MD",
                "Ed Melendez",
                "David M. Nichols, MD"
            ],
            "description": [
                "​Schedule:\n\n​5:00 - 5:45 Networking and refreshments\n\n​5:45 - 6:00 Welcome to NeuroFlow\n\n​6:00 - 6:30 Speaker (TBA)\n\n​6:30 - 7:30 Q&A and mingling\n\n​Our goal is to facilitate connections between all the players in the Philadelphia region: entrepreneurs, clinicians, technologists, activists, and builders. We’re interested in connecting people who recognize healthcare as a key opportunity facing our country: how do we make it more efficient, more equitable, and more effective, so that it reflects the aspirations of our country, the best of our communities, and the brilliance of humans?\n\n​There are no gatekeepers. Everyone of us deserves lives free of illness and infirmity, and everyone of us brings our respective talents and experience to the fore. Join us.\n\n​Participants at minimum should be committed to the Triple or Quadruple Aim in healthcare: reducing cost, improving quality, and making care more favorable to patients and providers.\n"
            ]
        },
        {
            "date": "Saturday, October 19",
            "time": "10:00 AM - 1:00 PM EDT",
            "venue": "Register to See Address",
            "event_name": "Healing Hike--PHILLY",
            "organizers": [
                "Hike+Heal",
                "Hike+Heal",
                "",
                "Erica",
                "",
                "Maggie Deptola",
                "Hike+Heal",
                "",
                "Erica",
                "",
                "Maggie Deptola"
            ],
            "description": [
                "​Join Hike Hosts Erica + Maggie for a 4-mile hike at Natural Lands' Wawa Preserve near Chester Heights, PA!\n\n​Classified as a moderately challenging route, where you can look forward to great hillside views and wildlife, perfect for a cool & sunny autumn day.\n\n​Please bring plenty of water, bug spray, sunscreen, and snacks! Hiking boots and long pants are encouraged. \n\n​​We will start our welcome circle at 10:20am, allowing ample time to park and use the restrooms. Hike starts promptly at 10:30am!\n\n​Hike Schedule:\n\n​10am--Check-In + Welcome Circle\n\n​10:30am--Healing Hike\n\n​12:30pm--Stretch + Closing Circle\n\n​Things to note:\n\n​Moderate level hike\n\n​​NOT stroller friendly\n\n​​Dog-friendly if your pup is leashed & okay with the mileage\n\n​​Bring plenty of water and snacks. Hiking boots and long pants are encouraged. [full checklist in confirmation email]\n\n​​This hike is rain or shine, so please bring a poncho or rain jacket in case of light rain\n\n​RESTROOMS: NOT available at the park (bring tissues and small plastic bag just in case you have to use nature's restroom)\n\n​PARKING: FREE parking lot\n\n​Please download the Luma app to have access to the group chat that will be utilized for all communication"
            ]
        },
        ...
    ]
}

Features

The Manual Scraper offers a range of optional features that you can customize and use as needed when setting up your scraper, allowing you to tailor the process to your specific requirements.

  • Pagination: Configure settings for handling pagination to scrape additional content across multiple pages.
  • Scheduler: Set up the timing for when the scraper should run, allowing for automated or scheduled scrapes.
  • Proxy: Manage proxy settings to rotate or apply specific proxies during the scraping process.
  • Advanced: Adjust advanced options to fine-tune the scraper’s behavior or performance.
  • Parsers: Modify the extracted data by adding custom parsers.
  • Logs: View and monitor detailed logs of the scraping activity, including successes, errors, and other events for troubleshooting or auditing purposes.