Getting the most out of the Billable Action Count metric

Temporal Cloud recently introduced a new metric: temporal_cloud_v1_billable_action_count. It tracks the per-second rate of billable actions, broken down by Action Type and Workflow Type. This makes it possible to answer questions that were previously difficult to investigate: which Workflows are consuming the most Actions, what types of Actions are driving costs, and whether Workflow behavior has changed after a deployment.

This post covers practical ways to use the metric for cost analysis, debugging, and alerting.

What this metric is, and isn’t#

Three things to keep in mind before building dashboards and alerts around the Billable Action Count metric:

This metric is not your bill. The values are usage estimates based on observed Action rates, aggregated into one-minute windows. Your actual Action charges may differ due to rounding, timing boundaries, pricing-tier calculations, and other bill components such as storage and support. Use this metric for directional analysis and relative comparisons, not invoice reconciliation.
Not all Action Types are included. Some Action Types, such as Export Workflow History, Enable Fairness, and Enable Provisioned Capacity, are excluded. Check the Actions reference for the complete list of what is and is not captured.
This metric is aggregated, not per execution. The data is broken down by Workflow Type and Action Type, but not by individual Workflow Execution. You can see the overall rate of Actions for a Workflow Type, but not what a specific execution contributed. Use this metric to identify trends and anomalies at the aggregate level, then use the Temporal UI to drill into specific executions within the relevant time window. Estimates of Actions in a Workflow Execution are covered in Workflow History.

With those caveats in mind, this metric is the most granular tool available for understanding Action consumption patterns in near real time.

Usage optimization analysis#

The most immediate application of this metric is understanding which parts of your system consume the most Actions.

Which Workflow Type generates the most Actions?#

topk(10,
  sum by (temporal_workflow_type) (
    temporal_cloud_v1_billable_action_count{
      temporal_namespace="$namespace"
    }
  )
)

This ranks Workflow Types by current Action rate, in Actions per second, within a Namespace. If one Workflow Type accounts for a disproportionate share, that is where optimization efforts will have the largest impact.

Which Action Type generates the most Actions?#

topk(10,
  sum by (action_type) (
    temporal_cloud_v1_billable_action_count{
      temporal_namespace="$namespace"
    }
  )
)

This identifies what kind of work is driving consumption. A high volume of retry_activity Actions may indicate a timeout or configuration issue rather than a design problem. Elevated signal_workflow rates could point to an upstream integration sending more Signals than necessary.

Combine both dimensions#

topk(20,
  sum by (temporal_workflow_type, action_type) (
    temporal_cloud_v1_billable_action_count{
      temporal_namespace="$namespace"
    }
  )
)

This is often where the most useful findings surface. It helps answer questions such as:

Is this Workflow expensive because it retries a lot?
Is this Workflow expensive because it starts many Child Workflows?
Is this Workflow expensive because it Heartbeats too frequently?

For example, you might discover that a single Workflow Type accounts for a large share of record_activity_heartbeat Actions, and that those Heartbeats are firing every second when every 30 seconds would be sufficient.

Horizontal bar chart showing top Actions by Workflow Type and Action Type.

Estimate total Actions over the last 24 hours#

Since this metric is a per-second rate, you can estimate total Actions by taking the average rate over a time window and multiplying by the number of seconds. This is often the more useful query for cost considerations because it smooths out temporary spikes and gives a daily estimate.

sum(
  avg_over_time(
    temporal_cloud_v1_billable_action_count{
      temporal_namespace="$namespace"
    }[24h:1m]
  )
) * 86400

avg_over_time gives the average per-second rate over the past 24 hours, and * 86400 converts that to an approximate Action count because there are 86,400 seconds in a day.

Line chart showing a raw Action rate compared with a 24-hour smoothed average and daily Action estimate.

To break this down by Workflow Type and Action Type:

topk(20,
  86400 *
  sum by (temporal_workflow_type, action_type) (
    avg_over_time(
      temporal_cloud_v1_billable_action_count{
        temporal_namespace="$namespace"
      }[24h:1m]
    )
  )
)

Acting on the results#

Once you know where Actions are concentrated, the Cost Optimization guide provides targeted strategies for each Action Type.

Debugging Workflow execution paths#

The Temporal UI lets you list all executions of a Workflow Type, but for complex Workflows with conditional logic and branching, it can be difficult to determine which executions followed which path just by scanning a list.

The Billable Action Count metric offers a different angle. By observing the occurrence of specific Action Types over time, you can infer which code paths your Workflows are taking without adding custom instrumentation.

Example: finding a rare bug hidden among thousands of successful executions#

Suppose you have an OrderFulfillment Workflow that processes thousands of orders per hour. A rare edge case, such as an order with a missing shipping address, causes the Workflow to enter an unexpected path before completing:

start_timer, waiting for customer input
signal_external_workflow, notifying a support Workflow

This only affects 0.1% of orders. Success rates stay at 99.9%, no alerts fire, and no one files a ticket. But the metric reveals it:

temporal_cloud_v1_billable_action_count{
  temporal_namespace="$namespace",
  temporal_workflow_type="OrderFulfillment",
  action_type="signal_external_workflow"
}

If signal_external_workflow should be near zero for this Workflow Type, even a small persistent rate stands out on a graph. You can pinpoint when it started, correlate it with a recent deployment, and search the UI for executions in that time window to find the specific orders triggering the edge case.

Stacked time series chart showing OrderFulfillment Actions by type, with signal_external_workflow increasing after a deployment.

This is the kind of issue that is nearly impossible to find by scanning execution lists, but becomes visible at the metric level.

Example: spotting a deployment regression that adds extra Actions#

A SubscriptionRenewal Workflow normally runs two Activities: charge payment and send confirmation. That is two schedule_activity Actions per execution.

During a refactor, a developer adds input validation as a separate Activity instead of handling it inline in the Workflow code. Tests pass, renewals complete successfully, and customers are charged correctly. But now every execution generates three schedule_activity Actions instead of two.

Across 10,000 renewals per hour, that is 10,000 extra Actions per hour, and nothing in your error rates, success metrics, or logs indicates a problem.

Start by graphing the Action breakdown over time:

sum by (action_type) (
  temporal_cloud_v1_billable_action_count{
    temporal_namespace="$namespace",
    temporal_workflow_type="SubscriptionRenewal"
  }
)

This shows the instantaneous per-second rate for each Action Type. On a graph, you would see the schedule_activity line jump on the date of the deployment. It is useful for identifying when something changed, but the value fluctuates with traffic. A spike during peak hours does not necessarily mean a regression.

Time series chart showing SubscriptionRenewal Actions by type, with schedule_activity jumping after a deployment.

To confirm, compare the average rate across two time periods using avg_over_time:

sum by (action_type) (
  avg_over_time(
    temporal_cloud_v1_billable_action_count{
      temporal_namespace="$namespace",
      temporal_workflow_type="SubscriptionRenewal"
    }[1d:1m]
  )
)

This averages the rate over a full day, smoothing out hourly traffic patterns. Run it for a day before the deployment and a day after. If schedule_activity averaged 20 per second before and 30 per second after, the 50% increase confirms the regression, independent of when during the day you check.

From there, you can review the diff, decide whether the validation warrants its own Activity, and make an informed choice instead of discovering the cost impact on next month’s invoice.

Alerting#

The Billable Action Count metric works well as the basis for cost and behavior alerts that adapt as traffic grows. The examples below are starting points. Adjust the comparison windows and ratios to match your workload patterns.

The queries below include or vector(0) to prevent false no-data alerts when a Namespace or Workflow Type has no traffic.

Alert on Action volume spikes relative to baseline#

(
  sum(temporal_cloud_v1_billable_action_count{temporal_namespace="$namespace"})
  or vector(0)
)
>
2 * avg_over_time(
  (
    sum(temporal_cloud_v1_billable_action_count{temporal_namespace="$namespace"})
    or vector(0)
  )[7d:1m]
)

This fires when the current Action rate for a Namespace exceeds 2x its 7-day average. It adapts to traffic growth automatically. As your baseline increases, the alert threshold increases with it. Possible causes for a spike include retry storms, runaway Workflows, or an upstream system flooding Workflows with Signals.

This alert is designed for catching cost and behavioral anomalies below Temporal Cloud’s system-level limits. For monitoring against RPS and APS rate limits, see the service health documentation.

Line chart showing Namespace total Actions spiking above a 2x baseline alert threshold.

The same pattern works at the Workflow Type level:

(
  sum by (temporal_workflow_type) (
    temporal_cloud_v1_billable_action_count{temporal_namespace="$namespace"}
  ) or vector(0)
)
>
2 * avg_over_time(
  (
    sum by (temporal_workflow_type) (
      temporal_cloud_v1_billable_action_count{temporal_namespace="$namespace"}
    ) or vector(0)
  )[7d:1m]
)

This catches scenarios such as a code change that adds Activities to a hot path, or a configuration change that increases Signal frequency for a single Workflow Type.

Alert when retries exceed a percentage of total Actions#

(
  sum(
    temporal_cloud_v1_billable_action_count{
      temporal_namespace="$namespace",
      action_type=~"retry_activity|retry_activity_local|retry_standalone_activity"
    }
  )
  or vector(0)
)
/
clamp_min(
  (
    sum(temporal_cloud_v1_billable_action_count{temporal_namespace="$namespace"})
    or vector(0)
  ),
  0.001
)
>
0.25

This fires when retries account for more than 25% of total Actions in a Namespace. In a healthy system, retries are typically a small fraction of overall volume. A ratio-based alert scales with traffic. Whether you are processing 100 or 10,000 Actions per second, a 25% retry rate signals a systemic issue: a downstream dependency failing, a misconfigured timeout, or a deployment introducing a new failure mode.

Line chart showing retry ratio crossing a 25% alert threshold.

Other ways to use this metric#

Beyond usage analysis, debugging, and alerting:

Usage planning: Track Action rates over weeks or months to forecast growth. Use avg_over_time with longer windows, such as 7d or 30d, to smooth daily variation and identify trends.
Measuring optimization impact: After applying a change from the Cost Optimization guide, such as reducing Heartbeat frequency, tuning Retry Policies, or switching to Local Activities, use the avg_over_time query from the debugging section to compare the Action rate for the affected Workflow Type before and after the change. If you reduced Heartbeats from every 1 second to every 30 seconds, you should see record_activity_heartbeat drop proportionally. This closes the loop: identify the hotspot, apply the fix, and verify the result.
Per-team usage attribution: If Workflow Type names follow a convention that includes team or service identifiers, label matching enables cost breakdowns:

sum by (temporal_workflow_type) (
  temporal_cloud_v1_billable_action_count{
    temporal_namespace="$namespace",
    temporal_workflow_type=~"payments-.*"
  }
)

Estimating usage per business transaction: If you know your system processes roughly N orders per hour, dividing total estimated Actions by business volume yields a per-transaction cost estimate, which is useful for reporting infrastructure costs to nontechnical stakeholders.

Operational reminders#

Query conventions: Metrics arrive as precomputed per-second rates with delta temporality. Do not use rate(), increase(), or irate(). Use sum(), avg(), max(), and min() instead. See the OpenMetrics migration guide.
Data latency: Metric data points are available within 3 minutes of origination. Factor this into alert evaluation windows and dashboard refresh intervals. See the OpenMetrics API reference.
Rate limits: The OpenMetrics endpoint supports 180 requests per account per hour, or roughly 3 requests per minute. Exceeding this returns HTTP 429 with a Retry-After header. A 30-second scrape interval stays within limits. See the API limits.
Max data points per scrape: Responses include up to 30,000 data points. Truncated responses set the X-Completeness header to limited. See the API limits.
Cardinality management: temporal_cloud_v1_billable_action_count is a high-cardinality metric. Every combination of Namespace, Workflow Type, and Action Type produces a separate time series. If you are approaching the 30,000 data point limit, filter at scrape time with /v1/metrics?namespaces=prod-*, drop high-cardinality labels post-scrape with metric_relabel_configs, and monitor series count with count({__name__=~"temporal_cloud_v1_.*"}). See Managing high cardinality.

Summary#

temporal_cloud_v1_billable_action_count provides granular visibility into what drives Temporal Cloud Actions consumption. The combination of Action Type and Workflow Type labels makes it possible to pinpoint cost hotspots, debug Workflow behavior, detect behavioral changes, and alert on conditions that matter to both reliability and cost. It is not a substitute for your invoice. For invoice-level accuracy, check the Usage Dashboard and the Billing API.

Start with the usage analysis queries to establish a baseline, set up alerts for the Action Types most relevant to your system, and revisit the numbers after each optimization to measure impact.