Map & General Agents

This workflow uses a two-agent approach to scrape entire websites.

This example workflow uses:

  • Map Agent: Crawls a website and discover all URLs that match the defined criteria
  • General Agent: Extracts detailed data from each discovered page

Step 1: Set up All Agents

Create and configure the Map Agent, Listing Agent, and General Agent so each one is ready to perform its specific role in the workflow.

Set Up the Manual Trigger

  1. Add a Manual Trigger node called "When clicking 'Execute workflow'".
  2. This allows you to run the complete three-agent workflow on demand.

Load Map Agent Configuration

  1. Add a Google Sheets node called "Get Map Agent Scraper".
  2. Select Read Rows or Lookup operation.
  3. Authenticate with your Google account.
  4. Select the Google Sheets file that stores your Map Agent Scraper ID and target URL.

Note

If you have not yet created a spreadsheet containing scraper IDs and target URLs, refer to the
Create a Scraper guide to configure your Google Sheets.

  1. This node reads the Map Agent scraper ID and target URL from your sheet.

Load General Agent Configuration

  1. Add a third Google Sheets node called "Get General Agent Scraper".
  2. Connect it after the "Get Map Agent Scraper" node.
  3. Select the Google Sheets file that stores your General Agent Scraper ID and target URL.

Note

If you have not yet created a spreadsheet containing scraper IDs and target URLs, refer to the
Create a Scraper guide to configure your Google Sheets.

  1. This loads the General Agent scraper ID for extracting property details.

Step 2: Run the Map Agent

Execute the Map Agent to discover all URLs on the website and filter them.

Run the Map Agent

  1. Add the MrScraper node called "Run map agent scraper".
  2. Select Map Agent as the operation.
  3. Configure using values from Google Sheets:
    • Scraper ID: {{ $json.mapScraperId }}
    • URL: {{ $json.mapTargetUrl }}
    • Max Pages: Set how many pages to crawl (e.g., 100)
    • Limit: Maximum URLs to discover (e.g., 200)
    • Include Patterns: URL pattern to match (e.g., "property-detail")
  4. The Map Agent will crawl the website and discover all matching URLs.

Filter & Limit Discovered URLs

  1. Add a Code node in JavaScript called "Filter & Limit Link".
  2. Filter URLs by pattern and limit the total number:
// Configuration
const MAX_URLS = 20; // Change this to limit how many URLs you want

// Get the data from the previous node
const inputData = $input.all();

// Extract URLs from the response
let urls = [];

if (inputData.length > 0 && inputData[0].json.data && inputData[0].json.data.urls) {
  urls = inputData[0].json.data.urls;
}

// Filter URLs that contain your target pattern
const filteredUrls = urls.filter(url => url.includes('property-detail'));

// Limit the number of URLs
const limitedUrls = filteredUrls.slice(0, MAX_URLS);

// Return as separate items for looping
return limitedUrls.map((url, index) => ({
  json: {
    url: url,
    index: index + 1,
    totalUrls: limitedUrls.length
  }
}));
  1. Adjust MAX_URLS and the filter pattern (property-detail) to match your needs.

Step 3: Process URLs with General Agent

Loop through discovered URLs and extract detailed data using the General Agent.

Loop Through Discovered URLs

  1. Add a Split in Batches node called "Looping Detail Page url".
  2. This processes each discovered URL one at a time.
  3. Keep Reset unchecked to continue looping.

Run the General Agent

  1. Add the MrScraper node inside the loop called "Run general agent scraper".
  2. Select General Agent as the operation.
  3. Configure using the scraper ID from Google Sheets:
    • Scraper ID: {{ $('Get General Agent Scraper').item.json.generalScraperId }}
    • URL: {{ $json.url }}
  4. This extracts detailed information from each discovered page.
  5. Connect this node back to the loop node to continue processing.

Step 4: Export the Results

Finally, export the collected data to Google Sheets and send a notification email via Gmail.

Flatten the JSON Data

  1. Add a Code node in JavaScript called "Flatten Object".
  2. Convert nested JSON into flat structure:
function flattenObject(obj, prefix = '', result = {}) {
  for (const key in obj) {
    if (!Object.prototype.hasOwnProperty.call(obj, key)) continue;

    const newKey = prefix ? `${prefix}_${key}` : key;
    const value = obj[key];

    if (value === null || value === undefined) {
      result[newKey] = null;
    } else if (Array.isArray(value)) {
      result[newKey] = value.length ? value.join(', ') : null;
    } else if (typeof value === 'object' && !(value instanceof Date)) {
      flattenObject(value, newKey, result);
    } else {
      result[newKey] = value;
    }
  }
  return result;
}

const items = $input.all();
const output = items.map(item => {
  const flattened = flattenObject(item.json);
  return { json: flattened };
});

return output;
  1. This prepares data for spreadsheet export.

Save to Google Sheets

  1. Add a Google Sheets node called "Get row(s) in sheet".
  2. Select Append Row operation.
  3. Authenticate with your Google account.
  4. Select your destination spreadsheet and sheet (can be different from your configuration sheet).
  5. Map the flattened data fields to your columns.

Send Email Notification

  1. Add a Gmail node called "Send a message".
  2. Configure:
    • To: Your email address
    • Subject: "Website Scraping Complete"
    • Message: Include summary of scraped pages or link to spreadsheet
  3. This notifies you when the entire workflow is complete.

On this page