Article
Learn the common fields for article PDP endpoints and example domains.
Article PDP Cache agents extract structured data from article and news detail pages, including headline, body, dates, authors, media, and breadcrumbs.
Description
Article agents return normalized fields for blog posts, news articles, and similar content: title, full text, HTML body, publication and modification dates (ISO and raw), authors, language, breadcrumbs, main image, lists of images/videos/audios, and page/canonical URLs.
Common Fields
| Field | Type | Description |
|---|---|---|
headline | string | Article headline or title |
articleBody | string | Full text content of the article |
articleBodyHtml | string | HTML markup of the article body |
description | string | Short summary or description of the article |
datePublished | string | Publication date in ISO 8601 format |
datePublishedRaw | string | Publication date as displayed on the page |
dateModified | string | Last modified date in ISO 8601 format |
dateModifiedRaw | string | Last modified date as displayed on the page |
authors | array | List of article authors. Each item: name (string), nameRaw (string) |
inLanguage | string | Language code of the article (e.g., en) |
breadcrumbs | array | Navigation breadcrumb trail. Each item: url (string), name (string) |
mainImage | object | Primary image. Structure: url (string) |
images | array | All images in the article. Each item: url (string) |
videos | array | All videos in the article. Each item: url (string) |
audios | array | All audio files in the article. Each item: url (string) |
url | string | URL of the article page |
canonicalUrl | string | Canonical URL of the article |
Example Domains / Websites
Article PDP agents are typically available for commonly used news and blog domains, for example:
- BBC — e.g.
https://www.bbc.com/news/...,https://www.bbc.co.uk/news/articles/... - The New York Times — e.g.
https://www.nytimes.com/... - Medium — e.g.
https://medium.com/... - Reuters — e.g.
https://www.reuters.com/... - The Guardian — e.g.
https://www.theguardian.com/... - CNN — e.g.
https://edition.cnn.com/... - TechCrunch — e.g.
https://techcrunch.com/... - Wikipedia — e.g.
https://en.wikipedia.org/wiki/...
Exact availability depends on the Marketplace. Check the marketplace for the full list of supported article domains.