Skip to main content

Temporal Cloud available metrics reference

This documentation provides a guide to the metrics generated within Temporal Cloud environments. It enumerates the metrics, labels, and operations that enable you to analyze Workflow latencies, state transitions, gRPC errors, and more.

Most Temporal Cloud metrics are suffixed with _count. This indicates that they behave largely like a Prometheus counter. You'll want to use a function like rate or increase to calculate a per-second rate of increase, or an extrapolated total increase over a time period.

rate(temporal_cloud_v0_frontend_service_request_count[5m])

temporal_cloud_v0_service_latency has _bucket, _count, and _sum metrics. This is because it's a Prometheus Histogram. You can use the _count and _sum metrics to calculate an average latency over a time period, or use the _bucket metric to calculate an approximate histogram quartile.

# the average latency observation over the last 5 minutes
rate(temporal_cloud_v0_service_latency_sum[5m]) / rate(temporal_cloud_v0_service_latency_count[5m])

# the approximate 99th percentile latency over the last 5 minutes, broken down by operation
histogram_quantile(0.99, sum(rate(temporal_cloud_v0_service_latency_bucket[5m])) by (le, operation))

Metrics labels

What labels can you use to filter metrics?

Metrics for all Namespaces in your account are available from the metrics endpoint. Use the following labels to filter metrics:

LabelExplanation
leLess than or equal to (le) is used in histograms to categorize observations into buckets based on their value being less than or equal to a predefined upper limit.
operationThis includes operations such as:
  • SignalWorkflowExecution
  • StartBatchOperation
  • StartWorkflowExecution
  • TaskQueueMgr
  • TerminateWorkflowExecution
  • UpdateNamespace
  • UpdateSchedule
See: Metric Operations
resource_exhausted_causeCause for resource exhaustion.
task_typeActivity or Workflow.
temporal_accountTemporal Account.
temporal_namespaceTemporal Namespace.
temporal_service_typeFrontend or Matching or History or Worker.
is_backgroundThis label on temporal_cloud_v0_total_action_count indicates when actions are produced by a Temporal background job, for example: hourly Workflow Export.
namespace_modeThis label on temporal_cloud_v0_total_action_count indicates if actions are produced by an active vs a passive Namespace. For a regular Namespace, namespace_mode will always be “active”.

The following is an example of how you can filter metrics using labels:

temporal_cloud_v0_poll_success_count{__rollup__="true", operation="TaskQueueMgr", task_type="Activity", temporal_account="12345", temporal_namespace="your_namespace.12345", temporal_service_type="matching"}

Available metrics

What metrics are emitted from Temporal Cloud?

The following metrics are emitted for your various Namespaces.

temporal_cloud_v0_frontend_service_error_count

This is a count of gRPC errors returned aggregated by operation.

temporal_cloud_v0_frontend_service_request_count

This is a count of gRPC requests received aggregated by operation.

temporal_cloud_v0_poll_success_count

Tasks that are successfully matched to a poller.

temporal_cloud_v0_poll_success_sync_count

Tasks that are successfully sync matched to a poller.

temporal_cloud_v0_poll_timeout_count

When no tasks are available for a poller before timing out.

temporal_cloud_v0_replication_lag_bucket

A histogram of replication lag during a specific time interval for a multi-region Namespace.

temporal_cloud_v0_replication_lag_count

The replication lag count during a specific time interval for a multi-region Namespace.

temporal_cloud_v0_replication_lag_sum

The sum of replication lag during a specific time interval for a multi-region Namespace.

temporal_cloud_v0_resource_exhausted_error_count

gRPC requests received that were rate-limited by Temporal Cloud, aggregated by cause.

temporal_cloud_v0_schedule_action_success_count

Successful execution of a Scheduled Workflow.

temporal_cloud_v0_schedule_buffer_overruns_count

When average schedule run length is greater than average schedule interval while a buffer_all overlap policy is configured.

temporal_cloud_v0_schedule_missed_catchup_window_count

Skipped Scheduled executions when Workflows were delayed longer than the catchup window.

temporal_cloud_v0_schedule_rate_limited_count

Workflows that were delayed due to exceeding a rate limit.

temporal_cloud_v0_service_latency_bucket

Latency for SignalWithStartWorkflowExecution, SignalWorkflowExecution, StartWorkflowExecution operations.

temporal_cloud_v0_service_latency_count

Count of latency observations for SignalWithStartWorkflowExecution, SignalWorkflowExecution, StartWorkflowExecution operations.

temporal_cloud_v0_service_latency_sum

Sum of latency observation time for SignalWithStartWorkflowExecution, SignalWorkflowExecution, StartWorkflowExecution operations.

temporal_cloud_v0_state_transition_count

Count of state transitions for each Namespace.

temporal_cloud_v0_total_action_count

Approximate count of Temporal Cloud Actions.

temporal_cloud_v0_workflow_cancel_count

Workflows canceled before completing execution.

temporal_cloud_v0_workflow_continued_as_new_count

Workflow Executions that were Continued-As-New from a past execution.

temporal_cloud_v0_workflow_failed_count

Workflows that failed before completion.

temporal_cloud_v0_workflow_success_count

Workflows that successfully completed.

temporal_cloud_v0_workflow_terminate_count

Workflows terminated before completing execution.

temporal_cloud_v0_workflow_timeout_count

Workflows that timed out before completing execution.

Metrics groups

How are metrics grouped and which groups support filtering?

The following lists enumerate the groups of metrics that support operation level filtering:

Frontend Service metrics

  • Temporal_cloud_v0_frontend_service_request_count
  • Temporal_cloud_v0_frontend_service_error_count
  • Temporal_cloud_v0_frontend_service_pending_requests_count

Poll metrics

  • Temporal_cloud_v0_poll_success_count
  • Temporal_cloud_v0_poll_success_sync_count
  • Temporal_cloud_v0_poll_timeout_count

Service latency metrics

  • Temporal_cloud_v0_service_latency_bucket
  • Temporal_cloud_v0_service_latency_count
  • Temporal_cloud_v0_service_latency_sum

Workflow metrics

  • Temporal_cloud_v0_workflow_cancel_count
  • Temporal_cloud_v0_workflow_continued_as_new_count
  • Temporal_cloud_v0_workflow_failed_count
  • Temporal_cloud_v0_workflow_success_count
  • Temporal_cloud_v0_workflow_terminate_count
  • Temporal_cloud_v0_workflow_timeout_count

Additional groups that do not support operation level filtering include:

Replication lag metrics

  • temporal_cloud_v0_replication_lag_bucket
  • temporal_cloud_v0_replication_lag_count
  • temporal_cloud_v0_replication_lag_sum

Scheduling metrics

  • temporal_cloud_v0_schedule_action_success_count
  • temporal_cloud_v0_schedule_buffer_overruns_count
  • temporal_cloud_v0_schedule_missed_catchup_window_count
  • temporal_cloud_v0_schedule_rate_limited_count

Ungrouped metrics include:

  • temporal_cloud_v0_resource_exhausted_error_count
  • temporal_cloud_v0_state_transition_count
  • temporal_cloud_v0_total_action_count

Metrics operations

What operations are available on Temporal Cloud to use with metrics?

  • AdminDescribeMutableState
  • AdminGetWorkflowExecutionRawHistory
  • AdminGetWorkflowExecutionRawHistoryV2
  • AdminReapplyEvents
  • CountWorkflowExecutions
  • CreateSchedule
  • DeleteSchedule
  • DeleteWorkflowExecution
  • DescribeBatchOperation
  • DescribeNamespace
  • DescribeSchedule
  • DescribeTaskQueue
  • DescribeWorkflowExecution
  • GetWorkerBuildIdCompatibility
  • GetWorkerTaskReachability
  • GetWorkflowExecutionHistory
  • GetWorkflowExecutionHistoryReverse
  • ListBatchOperations
  • ListClosedWorkflowExecutions
  • OperatorDeleteNamespace
  • PatchSchedule
  • PollActivityTaskQueue
  • PollWorkflowExecutionHistory
  • PollWorkflowExecutionUpdate
  • PollWorkflowTaskQueue
  • QueryWorkflow
  • RecordActivityTaskHeartbeat
  • RecordActivityTaskHeartbeatById

Operations on metrics groups

Which Temporal Cloud metrics groups are available for use with operations?

As the following table shows, certain metrics groups support operations for aggregation, filtering, and so forth:

Metrics Group / OperationsAll OperationsSignalWithStartWorkflowExecution / SignalWorkflowExecution / StartWorkflowExecutionTaskQueueMgrCompletionStats
Frontend Service MetricsX
Service Latency MetricsX
Poll MetricsX
Workflow MetricsX