Exports Cloudflare Containers metrics to Datadog using the Cloudflare GraphQL Analytics API. Runs as a Cloudflare Workflow triggered every minute to scrape metrics for all containers in an account.
- Node.js 24+
- npm
git clone https://github.com/flarelabs-net/cloudflare-container-metrics-datadog-exporter.git
cd cloudflare-container-metrics-datadog-exporter
npm installnpm run token-setupThis will prompt for your Cloudflare Account ID, open the browser to create an API token with the required permissions, and collect your Datadog API Key. It automatically creates a .dev.vars file.
To run GraphQL queries from a specific jurisdiction (closer to the data source), set the JURISDICTION variable in wrangler.jsonc:
This uses a Durable Object to proxy requests from the specified jurisdiction.
Add custom tags to all metrics by setting the DATADOG_TAGS variable in wrangler.jsonc:
"vars": {
"BATCH_SIZE": 5000,
"RETRY_LIMIT": 3,
"RETRY_DELAY_SECONDS": 1,
"JURISDICTION": "eu",
"DATADOG_TAGS": {
"env": "production",
"team": "platform",
"service": "containers"
}
}These tags will be added to all health and resource metrics sent to Datadog.
npx wrangler devIn another terminal run:
curl "http://localhost:8787/cdn-cgi/handler/scheduled"
You should see an "Ok" response from the curl and logs from the exporter.
npx wrangler deploy
npx wrangler secret bulk .dev.varsAll metrics are prefixed with cloudflare.containers..
| Metric | Type | Description |
|---|---|---|
instances.active |
gauge | Number of active instances |
instances.healthy |
gauge | Number of healthy instances |
instances.stopped |
gauge | Number of stopped instances |
instances.failed |
gauge | Number of failed instances |
instances.scheduling |
gauge | Number of scheduling instances |
instances.starting |
gauge | Number of starting instances |
instances.max |
gauge | Max instances configured |
Tags: account_id, application_id, application_name
| Metric | Type | Description |
|---|---|---|
instances.total.active |
gauge | Total active instances |
instances.total.healthy |
gauge | Total healthy instances |
instances.total.stopped |
gauge | Total stopped instances |
instances.total.failed |
gauge | Total failed instances |
instances.total.scheduling |
gauge | Total scheduling instances |
instances.total.starting |
gauge | Total starting instances |
instances.total.max |
gauge | Total max instances |
Tags: account_id
Use instances.total.max - instances.total.healthy to calculate available capacity.
| Metric | Type | Unit | Description |
|---|---|---|---|
cpu |
gauge | vCPU | CPU load |
memory |
gauge | MiB | Memory usage |
disk |
gauge | MB | Disk usage |
bandwidth.rx |
count | bytes | Bytes received |
bandwidth.tx |
count | bytes | Bytes transmitted |
Tags: account_id, application_id, application_name, version, instance_id, placement_id, stat
version- The container application version numberinstance_id- The instance identifier (maps to Cloudflare's deploymentId)placement_id- The placement identifier (specific realization of an instance, useful for tracking restarts/churn)stat- The aggregation type:p50,p90,p99,max(bandwidth metrics don't have a stat tag)
A pre-built dashboard is included in datadog-dashboard.json. To import it:
- In Datadog, go to Dashboards → New Dashboard → New Dashboard
- Click the cog icon (⚙️) in the top right
- Select Import dashboard JSON
- Paste the contents of
datadog-dashboard.json
See Datadog's documentation for more details.
The exporter runs as a Cloudflare Workflow triggered every minute via cron. Each workflow step uses configurable retry settings:
- Retries: Configurable via
RETRY_LIMIT(default: 3 attempts) - Delay: Configurable via
RETRY_DELAY_SECONDS(default: 1 second initial delay) - Backoff: Exponential (e.g., 1s, 2s, 4s)
Steps will automatically retry on transient failures (API errors, network issues).
| Variable | Type | Default | Description |
|---|---|---|---|
BATCH_SIZE |
number | 5000 | Maximum metrics per Datadog API request |
RETRY_LIMIT |
number | 3 | Number of retry attempts for failed workflow steps |
RETRY_DELAY_SECONDS |
number | 1 | Initial delay in seconds before retry (exponential backoff) |
JURISDICTION |
string | "" | Durable Object jurisdiction for GraphQL queries (e.g., "eu", "fedramp") |
DATADOG_TAGS |
object | {} | Custom tags to add to all metrics |