Enterprise API general documentation

Consult this page if you are currently running or considering running Graphite's Enterprise Internal Links API

Crawling and indexing

To index pages for related link selection, Graphite’s bot, user agent: GraphiteBot/1.0 (+<https://www.graphitehq.com>), crawls pages in the sitemap. The daily crawling run starts at 00:00 UTC and crawls 60-240 pages per minute.

Related Links selection

Links for pages are selected to maximize relatedness subject to the constraint that every page receives at least k incoming links. The relatedness of a pair of pages is computed as the semantic similarity of the page text.

When a page does not have enough related pages, or a page has not yet been indexed, the API returns links to randomly selected pages without replacement.

Check each /related-links endpoint details to know how many related links are currently selected for their responses.

The number of links selected for each page will be dependent on an agreed upon number we've built for the related links endpoint. We recommend 8 or more internal links per page to receive the most SEO benefit.

API

The links for a specific page can be retrieved from the API.

Host

URL: https://api.graphitehq.com/il/{{CLIENT}}/

Endpoints

{{PAGE_TYPE}}/related-links/

  • Description: Returns a list of related links for a page. If the application can’t find related links, it returns randomly selected links from the index.
  • URL: https://api.graphitehq.com/il/{{CLIENT}}/{{PAGE_TYPE}}/related-links
  • Method: GET
  • Allowed Cross-Origin Resource Sharing: True
  • Special Headers Required: None
  • HTTP Authentication: None
  • Input Parameters: Query Strings

Parameters

Query String Parameters

Links for a page can be retrieved using the page canonical URL.

Schema

  • Status: 200

{
    "$schema": "http://json-schema.org/draft-07/schema",
    "type": "object",
    "description": "Successful response from the related-links/ API endpoint.",
    "required": [
        "message",
        "related_links"
    ],
    "properties": {
        "message": {
            "type": "string",
            "description": "Response results description."
        },
        "related_links": {
            "type": "array",
            "description": "Related links array containing related links to a single page.",
            "items": {
                "type": "object",
                "description": "Related link object containing data from a single related link to a page.",
                "required": [
                    "type",
                    "title",
                    "url",
                    "url_path"
                ],
                "properties": {
                    "type": {
                        "type": "string",
                        "description": "Link type: 'related' if the link was selected using related selection logic, or 'random' if it was selected uniformly at random without replacement."
                    },
                    "title": {
                        "type": "string",
                        "description": "Page title."
                    },
                    "url": {
                        "type": "string",
                        "description": "Page URL."
                    },
                    "url_path": {
                        "type": "string",
                        "description": "Page URL path."
                    }
                }
            }
        }
    }
}


Other link properties in the API index available fields can be included, if desired.

  • Status: 4XX, 5XX

{
    "$schema": "http://json-schema.org/draft-07/schema",
    "type": "object",
    "description": "Error response from the related-links/ API endpoint.",
    "required": [
        "message"
    ],
    "properties": {
        "message": {
            "type": "string",
            "description": "Error message."
        }
    }
}

Example call

Request

cURL

curl --location --request GET 'https://api.graphitehq.com/il/{{CLIENT}}/{{PAGE_TYPE}}/related-links?url={{URL}}'

Javascript Fetch

var requestOptions = {
  method: 'GET',
  redirect: 'follow'
};

fetch("https://api.graphitehq.com/il/{{CLIENT}}/{{PAGE_TYPE}}/related-links?url={{URL}}", requestOptions)
  .then(response => response.text())
  .then(result => console.log(result))
  .catch(error => console.log('error', error));

Python

import requests

url = "https://api.graphitehq.com/il/{{CLIENT}}/{{PAGE_TYPE}}/related-links?url={{URL}}"
response = requests.request("GET", url)
print(response.text.encode('utf8'))

Response

{{EXAMPLE_RESPONSE}}

{
    "message": "...",
    "related_links": [
        {
            "type": "related",
            "title": "...",
            "url": "...",
            "url_path": "..."
        },
        ...
        {
            "type": "related",
            "title": "...",
            "url": "...",
            "url_path": "..."
        }
    ]
}

Index

Available Link Fields
The index has several fields with page information obtained from crawling. All of these fields are available for export through the related-links/ endpoint response:

  • text (text): Page plain text content.
  • title (text): Page title
  • url (text): Page canonical URL
  • url_path (text): Page URL path
  • {{ADDITIONAL_FIELDS_FROM_INDEX}}

Current Response Link Fields

  • title
  • type (added when processing the API request)
  • url
  • url_path
  • {{ADDITIONAL_FIELDS_FROM_INDEX}}

Uptime and latency

The API is built on standard AWS services and as of May 27, 2022 we have had no major outages, with a 99.9% uptime. The average response time is approximately 150ms.

Requests rate limits

The API endpoints are not restricted by request rate limits; however, we encourage keeping the requests under 20 requests/second per endpoint. Updating data for a set of 10k pages will be done in less than 10 minutes.

When using the API endpoints to get data by batches, the API users should plan their jobs accordingly, considering the number of pages and the data update period, which could vary from one day to one week.

Caching

Server-side rendering with caching is strongly recommended. The results can be cached using the endpoint URL with query string parameters as the key.

The links are updated at most daily, so a one day TTL is appropriate.