PagerDuty
PagerDuty is an incident response platform that allows developers to manage alerts, schedule rotations, define escalation policies, and more.
Integrating PagerDuty with Cortex allows you to:
Pull in PagerDuty services, on-call schedules, and escalation policies
The on-call user or team will appear in the Current On-call block on an entity's details page.
You can also view on-call information on an entity page in its side panel under Integrations > On-call.
Trigger incidents in PagerDuty directly from Cortex
Automatically surface the most vital information about entity health and metadata when an incident is triggered by using Cortex's On-Call Assistant tool
The On-Call Assistant automatically notifies users via Slack when an incident is triggered. The notifications include runbooks, links, dependencies, and key information about the affected entity.
View incidents from PagerDuty in an entity's event timeline
View on-call information from PagerDuty in the dev homepage
Use PagerDuty metrics in Eng Intelligence to understand key metrics and gain insight into services, incident response, and more.
Create Scorecards that track progress and drive alignment on projects involving your on-call schedules and alerts
How to configure PagerDuty with Cortex
Prerequisites
Before getting started:
Create a PagerDuty API key.
When adding the API key, you have the option to set
read
orwrite
permissions.Read: Enables Cortex to read any and all data from PagerDuty
Write: Allows users to trigger incidents from an entity page in Cortex, and enables On-Call Assistant
Configure the integration in Cortex
In Cortex, navigate to the PagerDuty settings page:
In Cortex, click your avatar in the lower left corner, then click Settings.
Under "Integrations", click PagerDuty.
Click Add configuration.
Configure the integration:
API key: Enter the API key you created in PagerDuty.
If the Read-only API key option is togged off, Cortex will use assume the provided API key has
write
permissions.
Click Save.
If you’ve set everything up correctly, you’ll see the option to Remove Integration in settings.
You can also use the Test configuration button to confirm that the configuration was successful. If your configuration is valid, you’ll see a banner that says “Configuration is valid. If you see issues, please see documentation or reach out to Cortex support.”
Enabling the On-call Assistant
At this stage, you can enable the Cortex On-call Assistant, which notifies users via Slack when an incident is triggered in PagerDuty. See the documentation for instructions: On-Call Assistant.
Note that On-Call Assistant will only work for service-level PagerDuty registrations since these notifications are related to affected services.
How to connect Cortex entities to PagerDuty
Discovery
By default, Cortex will use the entity tag (e.g. my-entity) or its name as the "best guess" for PagerDuty projects. For example, if your entity tag is my-entity, then the corresponding project in PagerDuty should also be my-entity.
If your PagerDuty projects don’t cleanly match the Cortex entity tag or name, you can override this in the Cortex entity descriptor.
Considerations for registering PagerDuty entities
Cortex recommends setting up PagerDuty at the service level by registering service entities with PagerDuty services, rather than configuring team entities with a PagerDuty schedule.
If PagerDuty is set up on a service level, you can see current on-call information listed within a given service's page. If PagerDuty is set up on the team level, you will only be able to view on-call rotation information from a team page.
Other benefits to setting up PagerDuty on a service level include:
Structuring PagerDuty 1-1 with services enables better alert routing and analytics, something that organizations struggle more with when PagerDuty is set up on a team level.
With a service-level setup, it’s easier to enforce all services to have a compliant on-call policy enacted in PagerDuty, especially when making use of Scorecards.
The service-level setup is less reliant on team members tagging incidents with service information because services and incidents are already linked.
You will gain the ability to get data from your Cortex catalog into PagerDuty, such as tier/criticality. By tying the service entities in the catalog with those in PagerDuty, you can automate processes and streamline severity protocols.
Editing the entity descriptor
For a given entity, you can define the PagerDuty service, schedule, or escalation policy within the entity’s YAML. You can only set up one of these three options per entity.
Each of these has the same field definitions.
id
PagerDuty ID for service, schedule, or escalation policy
✓
type
SERVICE
, SCHEDULE
or ESCALATION_POLICY
✓
Define a PagerDuty service
Find the service ID value in PagerDuty under Configuration > Services. The URL for the service will contain the ID, for example: https://cortexapp.pagerduty.com/services/
. You can only configure one service ID per entity.
Define a schedule
Find a schedule ID in PagerDuty under People > On-call schedules. Click the desired schedule to view its ID in the URL, for example: https://cortexapp.pagerduty.com/schedules#
. You can only configure one schedule per entity.
Define an escalation policy
Find the escalation policy ID in PagerDuty under People > Escalation Policies. Click the desired policy to view its ID in the URL, for example: https://cortexapp.pagerduty.com/escalation_policies#
. You can only configure one escalation policy per entity.
You can only set up one of the three options above per entity.
Identity mappings
Cortex maps email addresses in your PagerDuty instance to email addresses that belong to team members in Cortex. When identity mapping is set up, users will be able to see their personal on-call status from the developer homepage.
Expected results
Entity pages
Once the PagerDuty integration is set up, you’ll be able to view current on-call information in the "on-call" block on an entity details page. In the left sidebar of an entity, click On-call & incidents to view on-call information, escalation policy, service, and incidents.
The escalation policy and PagerDuty service details are hyperlinked to the corresponding pages in your PagerDuty instance.
Dev homepage
The PagerDuty integration enables Cortex to pull on-call information into the on-call block on the Dev homepage. On-call data from PagerDuty is refreshed every 60 minutes.
Eng Intelligence
Cortex also pulls in metrics from PagerDuty for Eng Intelligence. This tool will display MTTR, incidents opened, and incidents opened per week.
Notifications
If you have a Slack integration set up, you can also use the /cortex oncall
Slack Bot command to retrieve current on-call information. This feature works for both services and teams with registered PagerDuty schedules or escalation policies.
Scorecards and CQL
With the integration, you can create Scorecard rules and write CQL queries based on .
See more examples in the CQL Explorer in Cortex.
Trigger an incident
As described above under Editing the entity descriptor, a given entity can have a PagerDuty service, schedule, or escalation policy defined. Only entities with a PagerDuty service defined will include the option to trigger an incident directly from Cortex.
Your PagerDuty API key must include the write
permission in order to trigger incidents from an entity.
While viewing an entity in Cortex, follow these steps to trigger an incident in PagerDuty:
In Cortex, navigate to an entity. On the left side of an entity details page, click On-call & incidents.
In the upper right side of the entity's "On-call" page, click Trigger incident.
Configure the incident modal:
Summary: Enter a title for the incident.
Description: Enter a description of the incident.
Severity: Select a severity level.
At the bottom of the modal, click Trigger incident.
A confirmation screen will appear. In the confirmation, click the link to view the incident in PagerDuty.
Background sync
PagerDuty performs the following background jobs:
On-call: On-call information displayed on the developer homepage is refreshed every 60 minutes.
Services and incidents: Services used for automapping and active incidents viewable in the catalog are fetched approximately every 5 minutes, or however long the refresh takes.
Users: User data for identity mapping is synced daily at 10 a.m. UTC.
Still need help?
The following options are available to get assistance from the Cortex Customer Engineering team:
Email: help@cortex.io, or open a support ticket in the in app Resource Center
Chat: Available in the Resource Center
Slack: Users with a connected Slack channel will have a workflow added to their account. From here, you can either @CortexTechnicalSupport or add a
:ticket:
reaction to a question in Slack, and the team will respond directly.
Don’t have a Slack channel? Talk with your Customer Success Manager.
Last updated