Default fields returned by crawler

Internal Links API default crawler fields

A minimum set of fields is required to build minimum rich linking objects. These fields include: URL, Title and Text.

Graphite's Internal Links API crawler will also pick up optional fields when the metadata is found within the page being crawled. These fields, when found, will also be included in the API response.

Minimum required fields

URL

The URL of the page that will be provided as the link URL. If not declared, the Internal Links API will take the canonical tag (<link rel=canonical> inside <head>) as the link URL. If the canonical tag is not present, the Internal Links API will take the last URL crawler sees after all the possible redirects.

Title

The title of the page that will be provided as the link title. If not declared, the Internal Links API will take the first <h1> element found on the page as the link title. Additionally, the Open Graph’s og:title property or the page’s meta <title> tag could be taken in that order of precedence, depending on their existence.

Text

(multiple allowed) The page's textual content. The Internal Links API uses this to find related pages when building related links. If not declared, the Internal Links API will take the text content within the page <body>; remember that the <body> may contain sections representing noise, such as headers, footers, and sidebars.

Optional fields included when found

Image

The link thumbnail URL. It is an image that should be used as the link image. An image of the desired size should be provided as the Internal Links API does not store or process images. If not declared, the Internal Links API will take the Open Graph og:image property if present.

Description

The link description. Text that should be used as the link description; is usually used to show a snippet of the page content. If not declared, the Internal Links API will take the Open Graph og:description property or the HTML <meta name=”description”> element, in that order of precedence depending on their existence.

Modified Time

The page’s modified time. It is an ISO 8601 timestamp string. If not declared, the Internal Links API will take the Open Graph article:modified_time property if present.

Published Time

The page’s published time. It is an ISO 8601 timestamp string. If not declared, the Internal Links API will take the Open Graph article:published_time property if present.

Author

(multiple allowed) The author name of an article-like page (blog post, recipe, etc.). If not declared, the Internal Links API will take the Open Graph article:author property or the HTML <meta name=”author”> element, in that other of precedence, depending on their existence. It can be declared multiple times.

Custom fields

Custom fields are also supported by Graphite's Internal Links API. Custom properties could be added by using the prefix graphite:custom, i.e., graphite:custom:{property}.

For more information about adding custom fields, please review our Structured Data Specification for custom metadata